Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Hypothesis Testing the Circuit Hypothesis in LLMs
Authors: Claudia Shi, Nicolas Beltran Velez, Achille Nazaret, Carolina Zheng, AdriΓ Garriga-Alonso, Andrew Jesson, Maggie Makar, David Blei
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply these tests to six circuits described in the research literature. We find that synthetic circuits circuits that are hard-coded in the model align with the idealized properties. Circuits discovered in Transformer models satisfy the criteria to varying degrees. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, Columbia University, New York, USA 2Computer Science and Engineering, University of Michigan, Ann Arbor, USA 3FAR AI, USA |
| Pseudocode | Yes | Algorithm 1: Tail Test |
| Open Source Code | Yes | To facilitate future empirical studies of circuits, we created the circuitry package, a wrapper around the Transformer Lens library, which abstracts away lower-level manipulations of hooks and activations. The software is available at https: //github.com/blei-lab/circuitry. |
| Open Datasets | Yes | We use the dataset provided by Wang et al. [2023] following the structure above. [...] We use the dataset provided by Conmy et al. [2023] which contains 40 sequences of 300 tokens from the validation split of Open Web Text Gokaslan and Cohen [2019] filtered to include instances of induction. [...] We use the dataset provided by Heimersheim and Janiak [2023] following the structure above. |
| Dataset Splits | Yes | We use the dataset provided by Conmy et al. [2023] which contains 40 sequences of 300 tokens from the validation split of Open Web Text Gokaslan and Cohen [2019] filtered to include instances of induction. |
| Hardware Specification | Yes | Our package is implemented efficiently, and can evaluate hundreds of circuits in a few minutes on a single A5000 GPU. |
| Software Dependencies | No | The paper mentions using the 'Transformer Lens' library and their own 'circuitry package', but it does not provide specific version numbers for these software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | We draw 100 random circuits to form the reference distribution for the sufficiency and partial necessity tests. For minimality, we draw 10, 000 random edges for G-T and IOI and 1000 random edges for the other circuits. In all experiments, we use Eq. 1 with β2 norm as the faithfulness metric. We set q to be 0.9 and Ξ± to be 0.05. |