Circuit Component Reuse Across Tasks in Transformer Language Models

Authors: Jack Merullo, Carsten Eickhoff, Ellie Pavlick

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we present evidence that insights (both low-level findings about specific heads and higher-level findings about general algorithms) can indeed generalize across tasks. Specifically, we study the circuit discovered in Wang et al. (2022) for the Indirect Object Identification (IOI) task and 1.) show that it reproduces on a larger GPT2 model, and 2.) that it is mostly reused to solve a seemingly different task: Colored Objects (Ippolito & Callison-Burch, 2023). We provide evidence that the process underlying both tasks is functionally very similar, and contains about a 78% overlap in in-circuit attention heads. We further present a proof-of-concept intervention experiment, in which we adjust four attention heads in middle layers in order to repair the Colored Objects circuit and make it behave like the IOI circuit. In doing so, we boost accuracy from 49.6% to 93.7% on the Colored Objects task and explain most sources of error.
Researcher Affiliation Academia Jack Merullo Department of Computer Science Brown University jack merullo@brown.edu Carsten Eickhoff School of Medicine University of T ubingen carsten.eickhoff@uni-tuebingen.de Ellie Pavlick Department of Computer Science Brown University ellie pavlick@brown.edu
Pseudocode No The paper describes procedures like path patching in prose (Section 2.2 and Appendix A) but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Code available at: https://github.com/jmerullo/circuit_reuse
Open Datasets Yes Indirect Object Identification: We use the IOI task from Wang et al. (2022), which requires a model to predict the name of the indirect object in a sentence. ... Colored Objects: The Colored Objects task requires the model to generate the color of an object that was previously described in context among other other objects. An example is shown in Figure 1. We modify the Reasoning about Colored Objects task from the BIG-Bench dataset (Ippolito & Callison-Burch, 2023)3 to make it slightly simpler and always tokenize to the same length. ... 3https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_ tasks/reasoning_about_colored_objects
Dataset Splits No The paper does not explicitly provide training, validation, or test dataset splits. It mentions generating '1000 examples' for the datasets but does not detail how these examples are partitioned for training, validation, or testing for their analysis on the pre-trained GPT2-Medium model.
Hardware Specification No The paper mentions using GPT2-Small and GPT2-Medium models but does not specify any hardware details (e.g., GPU models, CPU types, memory, or cloud computing instances) used for running the experiments.
Software Dependencies No The paper mentions 'Transformer Lens (Nanda & Bloom, 2022)' but does not provide specific version numbers for this or any other software component used in the experiments.
Experiment Setup Yes In this section, we intervene on the model s forward pass to artificially activate these model components to behave as the IOI task would predict they should. To do this, we intervene on the three inhibition heads (12.3, 13.4, 13.13) and the negative mover head (19.1) we identify, forcing them to attend from the [end] position ( : token) to the incorrect color options. Consider the example in Figure 1: we would split the attention on these heads to 50% on the black and yellow tokens. We hypothesize that the model will integrate these interventions in a meaningful way that helps performance.