reproducibilityindex.ai

Explorations of Self-Repair in Language Models

Authors: Cody Rushing, Neel Nanda

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We strengthen this prior work by investigating self-repair across the whole pretraining distribution, by focusing on individual attention heads, and by investigating the mechanisms behind self-repair on the whole distribution. Our key findings are: 1. Direct effect self-repair is an imperfect, noisy process. It occurs across the full pretraining distribution, even while ablating individual heads (rather than full layers).
Researcher Affiliation	Academia	Cody Rushing 1 Neel Nanda Abstract 1University of Texas at Austin. Correspondence to: Cody Rushing <thisiscodyr@gmail.com>.
Pseudocode	No	The paper describes experimental procedures and definitions but does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	All the code for the experiments used in this paper is provided at https://github.com /starship006/backup_research.
Open Datasets	Yes	We measure the self-repair of individual attention heads in Pythia-1B (Biderman et al., 2023) on 1 million tokens of The Pile (Gao et al., 2020), the dataset used to train Pythia1B.
Dataset Splits	No	The paper mentions using 1 million tokens from The Pile for experiments and filters for the top 2% of tokens by direct effect, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or counts).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It mentions using 'Center for AI Safety Compute Cluster' but without specifications.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., PyTorch 1.9, Python 3.8). It only links to a GitHub repository for the code, which would contain such information but it's not explicitly stated in the paper text.
Experiment Setup	No	The paper describes its methodology for measuring direct effect and self-repair, including using resample ablations and filtering tokens. However, it does not provide specific experimental setup details such as hyperparameter values (e.g., learning rate, batch size, number of epochs) or specific training configurations for the models studied (Pythia, GPT2, Llama). These details would typically be found in an 'Experiment Setup' or 'Implementation Details' section if they were provided.