Successor Heads: Recurring, Interpretable Attention Heads In The Wild

Authors: Rhys Gould, Euan Ong, George Ogden, Arthur Conmy

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we analyze the behavior of successor heads in LLMs and find that they implement abstract representations that are common to different architectures. We perform vector arithmetic with these features to edit head behavior and provide insights into numeric representations within LLMs. Additionally, we study the behavior of successor heads on natural language data, where we find that successor heads are important for achieving a low loss on examples involving succession, and also identify interpretable polysemanticity in a Pythia successor head.
Researcher Affiliation Collaboration 1 University of Cambridge 2 Independent Correspondence to rg664@cam.ac.uk
Pseudocode No The paper describes experimental procedures and methods but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps.
Open Source Code No Full details of our dataset can be found in our open-sourced experiments.3 Available soon at https://github.com/euanong/numeric-representations/blob/main/exp_numeric_representations/model.py#L19
Open Datasets Yes Full details of our dataset can be found in our open-sourced experiments.3 Available soon at https://github.com/euanong/numeric-representations/blob/main/exp_numeric_representations/model.py#L19. In order to characterize the behavior of Pythia-1.4B’s successor head on natural-language data, we randomly sample 128 length-512 contexts from The Pile.
Dataset Splits Yes We train the SAE using number tokens from 0 to 500, both with and without a space ( 123 and 123 ), alongside other tasks, such as number words, cardinal words, days, months, etc. 90% of these tokens go into the train set, and the remaining 10% to the validation set.
Hardware Specification No The paper does not provide specific details on the hardware (e.g., GPU/CPU models, memory) used to conduct the experiments.
Software Dependencies No The paper does not provide specific software dependencies or library versions (e.g., Python, PyTorch, TensorFlow versions) used in the experiments.
Experiment Setup Yes We used the hyperparameters D = 512 and λ = 0.3, with a batch size of 64, and trained for 100 epochs. We use a learning rate of 0.001, and a batch size of 32, for 100 epochs.