Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Finding Transformer Circuits With Edge Pruning
Authors: Adithya Bhaskar, Alexander Wettig, Dan Friedman, Danqi Chen
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we frame automated circuit discovery as an optimization problem and propose Edge Pruning as an effective and scalable solution...We evaluate our approach, Edge Pruning, on four fronts: (1) we measure how faithfully the discovered circuits describe the behavior of the full model, (2) we verify if it can recover ground-truth circuits in Tracr models [Lindner et al., 2023] compiled from known program descriptions, (3) we evaluate how the method scales to more examples and (4) we assess its ability to find extremely sparse circuits in multi-billion parameter models. |
| Researcher Affiliation | Academia | Adithya Bhaskar Alexander Wettig Dan Friedman Danqi Chen Princeton Language and Intelligence (PLI), Princeton University EMAIL {awettig, dfriedman, danqic}EMAIL |
| Pseudocode | No | The paper does not contain any explicit sections or figures labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | We release our code and data publicly at https://github.com/princeton-nlp/Edge-Pruning. |
| Open Datasets | Yes | Indirect Object Identification (IOI-t1 and IOI) [Wang et al., 2023], Greater Than (GT) [Hanna et al., 2023], Gendered Pronoun (GP) [Athwin et al., 2023], Tracr [Lindner et al., 2023], Boolean Expressions from the BBH [Suzgun et al., 2022] benchmark suite. |
| Dataset Splits | Yes | In a departure from this convention, we separate each dataset into train, validation, and test splits, to avoid artifacts caused by overfitting. We use the following tasks. Indirect Object Identification (IOI-t1 and IOI) [Wang et al., 2023] is a task with instances of the format Friends Juana and Kristi found a mango at the bar. Kristi gave it to Juana . Conmy et al. [2023] use a version with a single template, which we refer to as IOI-t1 this version has 50 examples in each split. We also compare the methods on a variant (IOI) with 30 templates found on Hugging Face2. We randomly select 200 examples each for the train and validation splits, and 36, 084 examples for the test split. |
| Hardware Specification | Yes | The Tracr experiments use one NVIDIA A100 with 80 GB of memory. The GPT-2 experiments use either one NVIDIA A100 or one H100 (both 80 GB) each. The experiments of Table 1 all use one NVIDIA H100 for a fair runtime comparison. Each Code Llama-13B run utilizes 32 H100 GPUs and 600 gigabytes of CPU memory. |
| Software Dependencies | No | The paper mentions software like Adam optimizer, Hugging Face model classes, Flash Attention, and FSDP, but does not provide specific version numbers for these software dependencies or the programming language/environment. |
| Experiment Setup | Yes | For all tasks, we used a sequence length of 64 tokens with padding. A batch size of 32 was adopted, and the learning rate for both the edge and node masks, as well as for the lagrangians λ for both, was set to 0.8. The total number of optimization steps was 3000, and the target edge and node sparsities were linearly increased starting from 0 over the first 2500 steps. Evaluation and checkpointing were performed every 64 steps but we always used the final checkpoint to report results. |