reproducibilityindex.ai

Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning

Authors: Dan Braun, Jordan Taylor, Nicholas Goldowsky-Dill, Lee Sharkey

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Compared to standard SAEs, e2e SAEs offer a Pareto improvement: They explain more network performance, require fewer total features, and require fewer simultaneously active features per datapoint, all with no cost to interpretability. We explore geometric and qualitative differences between e2e SAE features and standard SAE features.
Researcher Affiliation	Collaboration	Dan Braun Jordan Taylor Nicholas Goldowsky-Dill Lee Sharkey Apollo Research ML Alignment & Theory Scholars (MATS), University of Queensland
Pseudocode	No	The paper describes the mathematical formulations of the SAE training losses (Llocal, Le2e, Le2e+downstream) in Section 2, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	We release our library for training e2e SAEs and reproducing our analysis at https://github.com/Apollo Research/e2e_sae.
Open Datasets	Yes	For our GPT2-small experiments, we train SAEs with each type of loss function on 400k samples of context size 1024 from the Open Web Text dataset [Gokaslan and Cohen, 2019] over a range of sparsity coefficients λ.We stream the dataset https://huggingface.co/datasets/ apollo-research/Skylion007-openwebtext-tokenizer-gpt2 which is a tokenized version of Open Web Text ([Gokaslan and Cohen, 2019]) (released under the license CC0-1.0).
Dataset Splits	No	The paper mentions training data and evaluating models on 500 samples of the Open Web Text dataset, but it does not explicitly specify a distinct validation dataset split with its size or percentage, separate from a test set, for purposes like hyperparameter tuning.
Hardware Specification	Yes	We used NVIDIA A100 GPUs with 80GB VRAM (although the GPU was saturated when using smaller batch sizes that used 40GB VRAM or less).
Software Dependencies	No	The paper mentions using 'Adam optimizer', 'Transformer Lens library', and 'Hugging Face s Transformers library', but does not provide specific version numbers for these software components.
Experiment Setup	Yes	We train for 400k samples of context size 1024 on Open Web Text with an effective batch size of 16. We use a learning rate of 5e 4, with a warmup of 20k samples, a cosine schedule decaying to 10% of the max learning rate, and the Adam optimizer [Kingma and Ba, 2017] with default hyperparameters.