Talking Heads: Understanding Inter-Layer Communication in Transformer Language Models
Authors: Jack Merullo, Carsten Eickhoff, Ellie Pavlick
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We analyze a mechanism used in two LMs to selectively inhibit items in a context in one task, and find that it underlies a commonly used abstraction across many context-retrieval behaviors. Specifically, we find that models write into low-rank subspaces of the residual stream to represent features which are then read out by later layers, forming low-rank communication channels [Elhage et al., 2021] between layers. A particular 3D subspace in model activations in GPT-2 can be traversed to positionally index items in lists, and we show that this mechanism can explain an otherwise arbitrary-seeming sensitivity of the model to the order of items in the prompt. That is, the model has trouble copying the correct information from context when many items crowd" this limited space. By decomposing attention heads with the Singular Value Decomposition (SVD), we find that previously described interactions between heads separated by one or more layers can be predicted via analysis of their weight matrices alone. We show that it is possible to manipulate the internal model representations as well as edit model weights based on the mechanism we discover in order to significantly improve performance on our synthetic Laundry List task, which requires recall from a list, often improving task accuracy by over 20%. |
| Researcher Affiliation | Academia | Jack Merullo Department of Computer Science Brown University jack_merullo@brown.edu Carsten Eickhoff School of Medicine University of Tübingen carsten.eickhoff@uni-tuebingen.de email Ellie Pavlick Department of Computer Science Brown University ellie_pavlick@brown.edu |
| Pseudocode | No | The paper describes its methods but does not include a dedicated section or figure labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | 1https://github.com/jmerullo/talking_heads.git |
| Open Datasets | Yes | On Open Web Text [Gokaslan and Cohen, 2019], we find that the inhibition heads are primarily active in lists and settings where repetitions should be avoided; for example, in comma separated lists (attending from commas to previously seen items). ... The IOI dataset... [Wang et al., 2022]... We propose a synthetic task... More details on how we generated the data are in Appendix F. |
| Dataset Splits | No | The paper describes the datasets used (e.g., 200 examples for IOI, 250 prompts for Laundry List) but does not provide explicit training, validation, and test dataset splits with percentages or sample counts. |
| Hardware Specification | Yes | We primarily used Nvidia 3090 GPUs for this work. Running the linear combinations of inhibition components in Section 5 was the most expensive experiment. Each dataset took about 12 hours on either a RTX 3090 or Quadro RTX gpu. |
| Software Dependencies | No | The paper mentions software like Transformer Lens library, GPT2-Small, and Pythia 160m, but does not specify version numbers for these or other underlying software dependencies (e.g., Python, PyTorch). |
| Experiment Setup | Yes | Our dataset contains 200 examples from the IOI task. We have 100 examples where the IO token is the first name ( Mary and John...John gave a drink to") and 100 where the S1 token comes first ( John and Mary... John gave drink to")." and "We test in increments of 10 from [-100, 100] along each axis, including every combination. |