Improving Sparse Decomposition of Language Model Activations with Gated Sparse Autoencoders
Authors: Senthooran Rajamanoharan, Arthur Conmy, Lewis Smith, Tom Lieberum, Vikrant Varma, Janos Kramar, Rohin Shah, Neel Nanda
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Gated SAEs on multiple models: a one layer GELU activation language model [28], Pythia-2.8B [3] and Gemma-7B [18], and on multiple sites within models: MLP layer outputs, attention layer outputs, and residual stream activations. Across these models and sites, we find Gated SAEs to be a Pareto improvement over baseline SAEs holding training compute fixed (Fig. 1): they yield sparser decompositions at any desired level of reconstruction fidelity. We also conduct further follow up ablations and investigations on a subset of these models and sites to better understand the differences between Gated SAEs and baseline SAEs. |
| Researcher Affiliation | Industry | Senthooran Rajamanoharan Google Deep Mind Arthur Conmy Google Deep Mind Lewis Smith Google Deep Mind Tom Lieberum Google Deep Mind Vikrant Varma Google Deep Mind J anos Kram ar Google Deep Mind Rohin Shah Google Deep Mind Neel Nanda Google Deep Mind |
| Pseudocode | Yes | See Appendix J for pseudo-code for the forward pass and loss function. |
| Open Source Code | No | We are unable to provide open access to the activation datasets or code used to train the SAEs in our experiments. |
| Open Datasets | No | We are unable to provide open access to the activation datasets or code used to train the SAEs in our experiments. |
| Dataset Splits | No | The paper mentions evaluating models on 'held-out tokens' which implies a test set, but does not provide specific percentages, sample counts, or clear train/validation/test splits for the dataset used during training to reproduce the data partitioning. |
| Hardware Specification | Yes | Individual SAEs were each trained on TPU-v3 slices with a 2x2 topology [20]. |
| Software Dependencies | No | The paper mentions the "Adam optimizer" but does not specify version numbers for any software dependencies required to replicate the experiment. |
| Experiment Setup | Yes | We use the Adam optimizer with β2 = 0.999 and β1 = 0.0, following Templeton et al. [47], as we also find this to be a slight improvement to training. We use a learning rate warm-up. ... We use learning rate 0.0003 for all Gated SAE experiments, and the GELU-1L baseline experiment. ... For the Pythia-2.8B and Gemma-7B baseline SAE experiments, we divided the L2 loss by E||x||2, motivated by better hyperparameter transfer, and so changed learning rate to 0.001 and 0.00075. ... We generate activations from sequences of length 128 for GELU-1L, 2048 for Pythia-2.8B and 1024 for Gemma-7B. We use a batch size of 4096 for all runs. We use 300,000 training steps for GELU-1L and Gemma-7B runs, and 400,000 steps for Pythia-2.8B runs. |