reproducibilityindex.ai

Mass-Editing Memory in a Transformer

Authors: Kevin Meng, Arnab Sen Sharma, Alex J Andonian, Yonatan Belinkov, David Bau

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We develop MEMIT, a method for directly updating a language model with many memories, demonstrating experimentally that it can scale up to thousands of associations for GPT-J (6B) and GPT-Neo X (20B), exceeding prior work by orders of magnitude. Our code and data are at memit.baulab.info.
Researcher Affiliation	Academia	Kevin Meng1,2 Arnab Sen Sharma2 Alex Andonian1 Yonatan Belinkov 3 David Bau2 1MIT CSAIL 2Northeastern University 3Technion IIT
Pseudocode	Yes	Algorithm 1 summarizes MEMIT, and additional implementation details are offered in Appendix B.
Open Source Code	Yes	Our code and data are at memit.baulab.info.
Open Datasets	Yes	We first test MEMIT on zs RE (Levy et al., 2017), a question-answering task from which we extract 10,000 real-world facts; zs RE tests MEMIT s ability to add correct information. Next, we test MEMIT s ability to add counterfactual information using COUNTERFACT, a collection of 21,919 factual statements (Meng et al. (2022), Appendix C).
Dataset Splits	No	The paper does not explicitly provide training/validation/test dataset splits with percentages or sample counts. While it mentions evaluation metrics for generalization and specificity, it does not specify how the datasets themselves were partitioned for training and validation.
Hardware Specification	Yes	All experiments are run on workstations with NVIDIA A6000 GPUs. GPT-J experiments fit into one 48GB A6000, but GPT-Neo X runs require at least two: one 48GB GPU for running the model in float16, and another slightly smaller GPU for executing the editing method.
Software Dependencies	Yes	The language models are loaded using Hugging Face Transformers (Wolf et al., 2019), and PyTorch (Paszke et al., 2019) is used for executing the model editing algorithms on GPUs.
Experiment Setup	Yes	The default ROME hyperparameters are available in their open source code: GPT-J updates are executed at layer 5, where optimization proceeds for 20 steps with a weight decay of 0.5, KL factor of 0.0625, and learning rate of 5e-1. On GPT-J, we choose R = {3, 4, 5, 6, 7, 8} and set λ, the covariance adjustment factor, to 15,000. δi optimization proceeds for 25 steps with a learning rate of 5e-1.