Finding Transformer Circuits With Edge Pruning
Authors: Adithya Bhaskar, Alexander Wettig, Dan Friedman, Danqi Chen
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we frame automated circuit discovery as an optimization problem and propose Edge Pruning as an effective and scalable solution...We evaluate our approach, Edge Pruning, on four fronts: (1) we measure how faithfully the discovered circuits describe the behavior of the full model, (2) we verify if it can recover ground-truth circuits in Tracr models [Lindner et al., 2023] compiled from known program descriptions, (3) we evaluate how the method scales to more examples and (4) we assess its ability to find extremely sparse circuits in multi-billion parameter models. |
| Researcher Affiliation | Academia | Adithya Bhaskar Alexander Wettig Dan Friedman Danqi Chen Princeton Language and Intelligence (PLI), Princeton University adithyab@princeton.edu {awettig, dfriedman, danqic}cs.@princeton.edu |
| Pseudocode | No | The paper does not contain any explicit sections or figures labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | We release our code and data publicly at https://github.com/princeton-nlp/Edge-Pruning. |
| Open Datasets | Yes | Indirect Object Identification (IOI-t1 and IOI) [Wang et al., 2023], Greater Than (GT) [Hanna et al., 2023], Gendered Pronoun (GP) [Athwin et al., 2023], Tracr [Lindner et al., 2023], Boolean Expressions from the BBH [Suzgun et al., 2022] benchmark suite. |
| Dataset Splits | Yes | In a departure from this convention, we separate each dataset into train, validation, and test splits, to avoid artifacts caused by overfitting. We use the following tasks. Indirect Object Identification (IOI-t1 and IOI) [Wang et al., 2023] is a task with instances of the format Friends Juana and Kristi found a mango at the bar. Kristi gave it to Juana . Conmy et al. [2023] use a version with a single template, which we refer to as IOI-t1 this version has 50 examples in each split. We also compare the methods on a variant (IOI) with 30 templates found on Hugging Face2. We randomly select 200 examples each for the train and validation splits, and 36, 084 examples for the test split. |
| Hardware Specification | Yes | The Tracr experiments use one NVIDIA A100 with 80 GB of memory. The GPT-2 experiments use either one NVIDIA A100 or one H100 (both 80 GB) each. The experiments of Table 1 all use one NVIDIA H100 for a fair runtime comparison. Each Code Llama-13B run utilizes 32 H100 GPUs and 600 gigabytes of CPU memory. |
| Software Dependencies | No | The paper mentions software like Adam optimizer, Hugging Face model classes, Flash Attention, and FSDP, but does not provide specific version numbers for these software dependencies or the programming language/environment. |
| Experiment Setup | Yes | For all tasks, we used a sequence length of 64 tokens with padding. A batch size of 32 was adopted, and the learning rate for both the edge and node masks, as well as for the lagrangians λ for both, was set to 0.8. The total number of optimization steps was 3000, and the target edge and node sparsities were linearly increased starting from 0 over the first 2500 steps. Evaluation and checkpointing were performed every 64 steps but we always used the final checkpoint to report results. |