Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Scaling Sparse Feature Circuits For Studying In-Context Learning
Authors: Dmitrii Kharlapenko, Stepan Shabalin, Arthur Conmy, Neel Nanda
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate TVC against four baseline approaches... We conducted extensive parameter sweeps... To validate the causal relevance of our decomposed task features, we conducted a series of steering experiments... To evaluate our SFC modifications, we measured faithfulness through ablation studies on our ICL task dataset. |
| Researcher Affiliation | Academia | 1ETH Zurich, Switzerland 2Georgia Institute of Technology, US |
| Pseudocode | Yes | Algorithm 1. Pseudocode for Task Vector Cleaning. Algorithm 2. Pseudocode for Sparse Feature Circuits indirect effect calculation. |
| Open Source Code | No | We will also plan to share SAE training codebase in JAX with a full suite of SAEs for Gemma 1 2B after the paper publication. Our SAEs and training code will be made public after paper publication. |
| Open Datasets | Yes | Our dataset for circuit finding is primarily derived from the function vectors paper Todd et al. (2024), which provides a diverse set of tasks for evaluating the existence and properties of function vectors in language models. We train residual and attention output SAEs as well as transcoders for layers 1-18 of the model on Fine Web Penedo et al. (2024). |
| Dataset Splits | Yes | The cleaning process is performed on a training batch of 24 pairs, with evaluation conducted on an additional 24 pairs. All prompts are zero-shot. Our methodology employed zero-shot prompts for task-execution features, measuring effects across a batch of 32 random pairs. |
| Hardware Specification | Yes | We use 4 v4 TPU chips running Jax Bradbury et al. (2018) (Equinox Kidger & Garcia (2021)) to train our SAEs. This is about 1 week of v4-8 TPU time. |
| Software Dependencies | No | The paper mentions software tools like Jax, Equinox, Huggingface's Flax LM implementations, and Penzai, but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | Our Gemma 1 2B SAEs are trained with a learning rate of 1e-3 and Adam betas of 0.0 and 0.99 for 150M (~100) tokens of Fine Web Penedo et al. (2024). We used a learning rate of 0.15 with the Gemma 1 2B, Phi-3, and Gemma 2 2B 65k models, 0.3 with Gemma 2 2B 16k, and 0.05 with 200 early stopping steps for Gemma 2 9B. We established an optimal steering scale of 15, which we then applied consistently across all subsequent experiments. |