Fast Model Editing at Scale
Authors: Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, Christopher D Manning
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments with T5, GPT, BERT, and BART models show that MEND is the only approach to model editing that effectively edits the behavior of models with more than 10 billion parameters. |
| Researcher Affiliation | Academia | Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, Christopher D. Manning Stanford University eric.mitchell@cs.stanford.edu |
| Pseudocode | Yes | Algorithm 1 MEND Training Algorithm 2 MEND Edit Procedure |
| Open Source Code | Yes | Code available at https://sites.google.com/view/mend-editing. |
| Open Datasets | Yes | Specifically, for seq2seq models, we use the zs RE question-answering dataset (Levy et al., 2017) ... For classification models (e.g., BERT), we use the FEVER fact-checking dataset (Thorne et al., 2018)... |
| Dataset Splits | Yes | For all algorithms, we use early stopping to end training early if the validation loss L = cedit Le + Lloc) does not decrease for 20000 steps on a subset of 500 validation examples, with a maximum number of training steps of 500,000. |
| Hardware Specification | Yes | All runs are trained entirely on a single NVIDIA RTX Titan or A40 GPU. |
| Software Dependencies | Yes | We use Py Torch (Paszke et al., 2019) for all experiments, specifically using the Higher library (Grefenstette et al., 2019) in order to implement the bi-level optimization in ENN as well as the inner loop of model editing for all algorithms. |
| Experiment Setup | Yes | We use edit learning rates of 5e-6 for GPT-Neo and GPT-J and 1e-4 for T5 models, and 1e-6 for the smaller models... We use a batch size of 10 (with gradient accumulation) and the seed 0 for all experiments. |