Finding Transformer Circuits With Edge Pruning

Authors: Adithya Bhaskar, Alexander Wettig, Dan Friedman, Danqi Chen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we frame automated circuit discovery as an optimization problem and propose Edge Pruning as an effective and scalable solution...We evaluate our approach, Edge Pruning, on four fronts: (1) we measure how faithfully the discovered circuits describe the behavior of the full model, (2) we verify if it can recover ground-truth circuits in Tracr models [Lindner et al., 2023] compiled from known program descriptions, (3) we evaluate how the method scales to more examples and (4) we assess its ability to find extremely sparse circuits in multi-billion parameter models.
Researcher Affiliation Academia Adithya Bhaskar Alexander Wettig Dan Friedman Danqi Chen Princeton Language and Intelligence (PLI), Princeton University adithyab@princeton.edu {awettig, dfriedman, danqic}cs.@princeton.edu
Pseudocode No The paper does not contain any explicit sections or figures labeled 'Pseudocode' or 'Algorithm'.
Open Source Code Yes We release our code and data publicly at https://github.com/princeton-nlp/Edge-Pruning.
Open Datasets Yes Indirect Object Identification (IOI-t1 and IOI) [Wang et al., 2023], Greater Than (GT) [Hanna et al., 2023], Gendered Pronoun (GP) [Athwin et al., 2023], Tracr [Lindner et al., 2023], Boolean Expressions from the BBH [Suzgun et al., 2022] benchmark suite.
Dataset Splits Yes In a departure from this convention, we separate each dataset into train, validation, and test splits, to avoid artifacts caused by overfitting. We use the following tasks. Indirect Object Identification (IOI-t1 and IOI) [Wang et al., 2023] is a task with instances of the format Friends Juana and Kristi found a mango at the bar. Kristi gave it to Juana . Conmy et al. [2023] use a version with a single template, which we refer to as IOI-t1 this version has 50 examples in each split. We also compare the methods on a variant (IOI) with 30 templates found on Hugging Face2. We randomly select 200 examples each for the train and validation splits, and 36, 084 examples for the test split.
Hardware Specification Yes The Tracr experiments use one NVIDIA A100 with 80 GB of memory. The GPT-2 experiments use either one NVIDIA A100 or one H100 (both 80 GB) each. The experiments of Table 1 all use one NVIDIA H100 for a fair runtime comparison. Each Code Llama-13B run utilizes 32 H100 GPUs and 600 gigabytes of CPU memory.
Software Dependencies No The paper mentions software like Adam optimizer, Hugging Face model classes, Flash Attention, and FSDP, but does not provide specific version numbers for these software dependencies or the programming language/environment.
Experiment Setup Yes For all tasks, we used a sequence length of 64 tokens with padding. A batch size of 32 was adopted, and the learning rate for both the edge and node masks, as well as for the lagrangians λ for both, was set to 0.8. The total number of optimization steps was 3000, and the target edge and node sparsities were linearly increased starting from 0 over the first 2500 steps. Evaluation and checkpointing were performed every 64 steps but we always used the final checkpoint to report results.