reproducibilityindex.ai

Massive Editing for Large Language Models via Meta Learning

Authors: Chenmien Tan, Ge Zhang, Jie Fu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our method is evaluated by editing up to thousands of facts on LMs with different architectures, i.e., BERT-base, GPT-2, T5XL (2.8B), and GPT-J (6B), across various knowledge-intensive NLP tasks, i.e., closed book fact-checking and question answering.
Researcher Affiliation	Collaboration	Chenmien Tan1 , Ge Zhang23 , Jie Fu4 University of Edinburgh1, University of Waterloo2, 01.AI3, HKUST4
Pseudocode	Yes	Algorithm 1: Editor Inference
Open Source Code	Yes	Our code is available at https://github.com/Chenmien Tan/malmen.
Open Datasets	Yes	For BERT-base, we use the Fact Extraction and VERtification (FEVER) dataset (Thorne et al., 2018) with the identical train/val splits with De Cao et al. (2021); Mitchell et al. (2022), which contains 104,996 training and 10,444 validation samples.
Dataset Splits	Yes	For BERT-base, we use the Fact Extraction and VERtification (FEVER) dataset (Thorne et al., 2018) with the identical train/val splits with De Cao et al. (2021); Mitchell et al. (2022), which contains 104,996 training and 10,444 validation samples.
Hardware Specification	Yes	As for computation time, it takes 12.25 and 33.75 hours in total (including training) for MALMEN and MEMIT to edit 16,384 facts on GPT-J using a single NVIDIA A100 GPU, respectively.
Software Dependencies	No	The paper mentions optimizers like Adam and AdamW and uses various language models, but it does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used in the implementation.
Experiment Setup	Yes	We use identical hyper-parameter for MEND and MALMEN as follows. Rank of linear transformation in hyper-network 1920 Number of blocks in hyper-network 2 Initial learning rate 1e-6 Meta-learning rate 1e-5 Locality coefficient 1 Maximum meta gradient norm 1