Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Compute Optimal Inference and Provable Amortisation Gap in Sparse Autoencoders

Authors: Charles O’Neill, Alim Gumran, David Klindt

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our results reveal substantial performance gains with minimal compute increases in correct inference of sparse codes. We evaluate four types of encoding methods on synthetic datasets with known ground-truth features. We evaluate these methods on two dimensions: alignment with true underlying sparse features and inference of the correct sparse codes, while accounting for computational costs during both training and inference. To demonstrate real-world applicability, we also train models on GPT-2 activations (Radford et al., 2019), showing that more complex methods such as MLPs can yield more interpretable features than SAEs in large language models.
Researcher Affiliation Academia Charles O Neill 1 Alim Gumran 2 David Klindt 3 1Australian National University 2Nazarbayev University 3Cold Spring Harbor Laboratory. Correspondence to: Charles O Neill <EMAIL>.
Pseudocode No The paper describes methods like Sparse Coding (SC) with an iterative update rule `st+1 = st + η L` in Section 3.3, but it does not present any clearly labeled pseudocode or algorithm blocks. It formulates objectives and describes steps in paragraph text or as mathematical equations, not in a structured, code-like format.
Open Source Code No The paper does not contain an explicit statement by the authors about releasing their own code for the methodology described. It refers to a third-party tool for automated interpretability (Juang et al., 2024) and other related work (Lieberum et al., 2024) but provides no specific repository link or explicit code release statement for the presented research.
Open Datasets Yes To investigate the interpretability of more complex encoding techniques, we trained three distinct methods on 406 million tokens from Open Web Text: a sparse autoencoder with a single linear encoder layer and Re LU activation, a multilayer perceptron encoder with one hidden layer of width 8448, and a locally competitive algorithm following the approach of Olshausen & Field (1997) and Blumensath & Davies (2008).
Dataset Splits Yes Finally, we evaluate all four methods when both latent representations and dictionary are unknown. We use a dataset of 2048 samples, evenly split between training and testing sets, and conduct 5 independent runs of 100,000 steps.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. It does not mention any specific hardware specifications or cloud computing instances with their configurations.
Software Dependencies No The paper mentions using 'Adam' for training and 'scikit-learn' for dictionary learning (specifically LARS), but it does not provide specific version numbers for these or any other software dependencies, making it difficult to precisely replicate the software environment.
Experiment Setup Yes All experiments were conducted using synthetic data with N = 16 sparse sources, M = 8 measurements, and K = 3 active components per timestep... We use a dataset of 2048 samples, evenly split between training and testing sets, and conduct 5 independent runs of 100,000 steps... All methods were trained using Adam with a learning rate of 3 10 4 and an L1 penalty of 1 10 4. Following Bricken et al. (2023) and Cunningham et al. (2023), we resampled dead neurons every 15,000 steps... We scaled up our synthetic experiments for the known Z case to N = 1000 sparse sources, M = 200 measurements, and K = 20 active components, training on 500,000 samples for 20,000 steps... we modified our training procedure to use minibatch processing (batch size 1024)... We train for 50,000 iterations with a learning rate of 1e-4.