请问在cerebras/SlimPajama-627B 的Test子集上用llama-Scope得到的结果和图12有差异是怎么回事呢？

Hi, I’m trying to reproduce Figure 2 for the L15R-8x SAE on Llama-3.1-8B.

### Setup

LM: meta-llama/Llama-3.1-8B

SAE release: llama_scope_lxr_8x, sae_id = l15r_8x

Script: run_llamascope_inference.py

### Cmd (simplified):
python run_llamascope_inference.py --model-name meta-llama/Llama-3.1-8B --sae-release llama_scope_lxr_8x --layers 15 --dataset-cache-path slimpajama_test_cache.jsonl.gz --dataset-max-prompts 5120 --prompts-per-batch 96 --max-seq-len 512

Dataset: SlimPajama cached split, 5120 prompts, max seq len 512 (~1.84M tokens)

Observed metrics (summary JSON)

layer: L15R-8x

explained_variance ≈ 0.702

mean_l0 ≈ 32.3

tokens_evaluated ≈ 1.84M

activation_rate ≈ 9.9e-4, feature_coverage ≈ 0.983

(Δ LM loss is not implemented in my eval script yet.)

Comparison to paper

From Fig.2, my reading is that L15R-8x TopK/JumpReLU is around:

mean L0 ≈ 50

EV ≈ 0.72

So EV is in the right ballpark but a bit lower, while L0 is noticeably lower than the nominal top_k = 50.

### Questions

What exact eval configuration (activation scaling, dataset, token count, gating) was used to produce Fig.2 for llama_scope_lxr_8x?

Should I be using a specific ActivationScaler config from the SAE release when calling encode/decode?

Is there a reference eval script to reproduce Fig.2, including Δ LM loss?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

请问在cerebras/SlimPajama-627B 的Test子集上用llama-Scope得到的结果和图12有差异是怎么回事呢？ #128

Setup

Cmd (simplified):

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

请问在cerebras/SlimPajama-627B 的Test子集上用llama-Scope得到的结果和图12有差异是怎么回事呢？ #128

Description

Setup

Cmd (simplified):

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions