Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Hypothesis Testing the Circuit Hypothesis in LLMs

Authors: Claudia Shi, Nicolas Beltran Velez, Achille Nazaret, Carolina Zheng, AdriΓ  Garriga-Alonso, Andrew Jesson, Maggie Makar, David Blei

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply these tests to six circuits described in the research literature. We find that synthetic circuits circuits that are hard-coded in the model align with the idealized properties. Circuits discovered in Transformer models satisfy the criteria to varying degrees.
Researcher Affiliation Collaboration 1Department of Computer Science, Columbia University, New York, USA 2Computer Science and Engineering, University of Michigan, Ann Arbor, USA 3FAR AI, USA
Pseudocode Yes Algorithm 1: Tail Test
Open Source Code Yes To facilitate future empirical studies of circuits, we created the circuitry package, a wrapper around the Transformer Lens library, which abstracts away lower-level manipulations of hooks and activations. The software is available at https: //github.com/blei-lab/circuitry.
Open Datasets Yes We use the dataset provided by Wang et al. [2023] following the structure above. [...] We use the dataset provided by Conmy et al. [2023] which contains 40 sequences of 300 tokens from the validation split of Open Web Text Gokaslan and Cohen [2019] filtered to include instances of induction. [...] We use the dataset provided by Heimersheim and Janiak [2023] following the structure above.
Dataset Splits Yes We use the dataset provided by Conmy et al. [2023] which contains 40 sequences of 300 tokens from the validation split of Open Web Text Gokaslan and Cohen [2019] filtered to include instances of induction.
Hardware Specification Yes Our package is implemented efficiently, and can evaluate hundreds of circuits in a few minutes on a single A5000 GPU.
Software Dependencies No The paper mentions using the 'Transformer Lens' library and their own 'circuitry package', but it does not provide specific version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup Yes We draw 100 random circuits to form the reference distribution for the sufficiency and partial necessity tests. For minimality, we draw 10, 000 random edges for G-T and IOI and 1000 random edges for the other circuits. In all experiments, we use Eq. 1 with β„“2 norm as the faithfulness metric. We set q to be 0.9 and Ξ± to be 0.05.