Locating and Editing Factual Associations in GPT

Authors: Kevin Meng, David Bau, Alex Andonian, Yonatan Belinkov

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We analyze the storage and recall of factual associations in autoregressive transformer language models, finding evidence that these associations correspond to localized, directly-editable computations. We first develop a causal intervention for identifying neuron activations that are decisive in a model s factual predictions. This reveals a distinct set of steps in middle-layer feed-forward modules that mediate factual predictions while processing subject tokens. To test our hypothesis that these computations correspond to factual association recall, we modify feedforward weights to update specific factual associations using Rank-One Model Editing (ROME). We find that ROME is effective on a standard zero-shot relation extraction (zs RE) model-editing task. We also evaluate ROME on a new dataset of difficult counterfactual assertions, on which it simultaneously maintains both specificity and generalization, whereas other methods sacrifice one or another.
Researcher Affiliation Academia David Bau Northeastern University Alex Andonian Yonatan Belinkov Technion IIT
Pseudocode No The paper describes its method steps (e.g., Step 1: Choosing k to Select the Subject, Step 2: Choosing v to Recall the Fact, Step 3: Inserting the Fact) in Section 3.1 but these are presented as descriptive text rather than formal pseudocode or algorithm blocks.
Open Source Code Yes The code, dataset, visualizations, and an interactive demo notebook are available at https://rome.baulab.info/.
Open Datasets Yes The code, dataset, visualizations, and an interactive demo notebook are available at https://rome.baulab.info/. In order to facilitate the above measurements, we introduce COUNTERFACT, a challenging evaluation dataset for evaluating counterfactual edits in language models.
Dataset Splits No Our evaluation slice contains 10,000 records, each containing one factual statement, its paraphrase, and one unrelated factual statement. Table 4 showcases quantitative results on GPT-2 XL (1.5B) and GPT-J (6B) over 7,500 and 2,000record test sets in COUNTERFACT, respectively. The paper specifies the size of its evaluation/test sets, but does not provide explicit details about how the overall data was split into training, validation, and testing sets, or refer to standard splits for these portions.
Hardware Specification Yes All experiments were run on a single NVIDIA 3090 GPU, with the exception of GPT-J, which was run on an NVIDIA A100 GPU at the Technion, and GPT-2 XL baselines, which were run on a cluster of A100s at Northeastern. (Appendix E.1)
Software Dependencies No The codebase is written in Python using PyTorch. (Appendix E.1) While it mentions Python and PyTorch, it does not specify any version numbers for these or other software components.
Experiment Setup Yes We optimize v for 100 steps using Adam (Kingma & Ba, 2015) with an initial learning rate of 0.0001, linearly decaying to 0. (Appendix E.5) Our Adam optimizer (Kingma & Ba, 2015) uses a learning rate of 1e-4 and a batch size of 1. We trained for 100 steps for each fact. (Appendix E.3)