Massive Editing for Large Language Models via Meta Learning

Authors: Chenmien Tan, Ge Zhang, Jie Fu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method is evaluated by editing up to thousands of facts on LMs with different architectures, i.e., BERT-base, GPT-2, T5XL (2.8B), and GPT-J (6B), across various knowledge-intensive NLP tasks, i.e., closed book fact-checking and question answering.
Researcher Affiliation Collaboration Chenmien Tan1 , Ge Zhang23 , Jie Fu4 University of Edinburgh1, University of Waterloo2, 01.AI3, HKUST4
Pseudocode Yes Algorithm 1: Editor Inference
Open Source Code Yes Our code is available at https://github.com/Chenmien Tan/malmen.
Open Datasets Yes For BERT-base, we use the Fact Extraction and VERtification (FEVER) dataset (Thorne et al., 2018) with the identical train/val splits with De Cao et al. (2021); Mitchell et al. (2022), which contains 104,996 training and 10,444 validation samples.
Dataset Splits Yes For BERT-base, we use the Fact Extraction and VERtification (FEVER) dataset (Thorne et al., 2018) with the identical train/val splits with De Cao et al. (2021); Mitchell et al. (2022), which contains 104,996 training and 10,444 validation samples.
Hardware Specification Yes As for computation time, it takes 12.25 and 33.75 hours in total (including training) for MALMEN and MEMIT to edit 16,384 facts on GPT-J using a single NVIDIA A100 GPU, respectively.
Software Dependencies No The paper mentions optimizers like Adam and AdamW and uses various language models, but it does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used in the implementation.
Experiment Setup Yes We use identical hyper-parameter for MEND and MALMEN as follows. Rank of linear transformation in hyper-network 1920 Number of blocks in hyper-network 2 Initial learning rate 1e-6 Meta-learning rate 1e-5 Locality coefficient 1 Maximum meta gradient norm 1