Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Compute Optimal Inference and Provable Amortisation Gap in Sparse Autoencoders

Authors: Charles O’Neill, Alim Gumran, David Klindt

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results reveal substantial performance gains with minimal compute increases in correct inference of sparse codes. We evaluate four types of encoding methods on synthetic datasets with known ground-truth features. We evaluate these methods on two dimensions: alignment with true underlying sparse features and inference of the correct sparse codes, while accounting for computational costs during both training and inference. To demonstrate real-world applicability, we also train models on GPT-2 activations (Radford et al., 2019), showing that more complex methods such as MLPs can yield more interpretable features than SAEs in large language models.
Researcher Affiliation	Academia	Charles O Neill 1 Alim Gumran 2 David Klindt 3 1Australian National University 2Nazarbayev University 3Cold Spring Harbor Laboratory. Correspondence to: Charles O Neill <EMAIL>.
Pseudocode	No	The paper describes methods like Sparse Coding (SC) with an iterative update rule `st+1 = st + η L` in Section 3.3, but it does not present any clearly labeled pseudocode or algorithm blocks. It formulates objectives and describes steps in paragraph text or as mathematical equations, not in a structured, code-like format.
Open Source Code	No	The paper does not contain an explicit statement by the authors about releasing their own code for the methodology described. It refers to a third-party tool for automated interpretability (Juang et al., 2024) and other related work (Lieberum et al., 2024) but provides no specific repository link or explicit code release statement for the presented research.
Open Datasets	Yes	To investigate the interpretability of more complex encoding techniques, we trained three distinct methods on 406 million tokens from Open Web Text: a sparse autoencoder with a single linear encoder layer and Re LU activation, a multilayer perceptron encoder with one hidden layer of width 8448, and a locally competitive algorithm following the approach of Olshausen & Field (1997) and Blumensath & Davies (2008).
Dataset Splits	Yes	Finally, we evaluate all four methods when both latent representations and dictionary are unknown. We use a dataset of 2048 samples, evenly split between training and testing sets, and conduct 5 independent runs of 100,000 steps.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. It does not mention any specific hardware specifications or cloud computing instances with their configurations.
Software Dependencies	No	The paper mentions using 'Adam' for training and 'scikit-learn' for dictionary learning (specifically LARS), but it does not provide specific version numbers for these or any other software dependencies, making it difficult to precisely replicate the software environment.
Experiment Setup	Yes	All experiments were conducted using synthetic data with N = 16 sparse sources, M = 8 measurements, and K = 3 active components per timestep... We use a dataset of 2048 samples, evenly split between training and testing sets, and conduct 5 independent runs of 100,000 steps... All methods were trained using Adam with a learning rate of 3 10 4 and an L1 penalty of 1 10 4. Following Bricken et al. (2023) and Cunningham et al. (2023), we resampled dead neurons every 15,000 steps... We scaled up our synthetic experiments for the known Z case to N = 1000 sparse sources, M = 200 measurements, and K = 20 active components, training on 500,000 samples for 20,000 steps... we modified our training procedure to use minibatch processing (batch size 1024)... We train for 50,000 iterations with a learning rate of 1e-4.