Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Memory-Based Model Editing at Scale

Authors: Eric Mitchell, Charles Lin, Antoine Bosselut, Christopher D Manning, Chelsea Finn

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments indicate that SERAC consistently outperforms past approaches to model editing by a substantial margin on the three most difficult problems. Code, data, and additional project information will be made available at https://sites.google.com/view/serac-editing.
Researcher Affiliation Academia 1Stanford University Department of Computer Science 2EPFL School of Computer and Communication Sciences.
Pseudocode No The paper does not include a clearly labeled pseudocode or algorithm block.
Open Source Code No Code, data, and additional project information will be made available at https://sites.google.com/view/serac-editing.
Open Datasets Yes The QA setting uses the zs RE question-answering problem introduced by De Cao et al. (2021). We use this dataset as a starting point of reference to connect our evaluations with prior work. ... We introduce the FC setting, building on the Vitamin C fact verification dataset (Schuster et al., 2021)... As a base model, we use the BERTbase model trained by De Cao et al. (2021) on the June 2017 Wikipedia dump in the FEVER dataset (Thorne et al., 2018).
Dataset Splits Yes Data were randomly split (by entity) into 90-5-5 train/val/test splits.
Hardware Specification No The paper mentions models like T5-large and BERT-base but does not specify the hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies No The paper mentions using 'Huggingface (Wolf et al., 2019) implementations' and specific models like 'distilbert-base-cased (Sanh et al., 2019)' but does not provide specific version numbers for the software libraries or frameworks used (e.g., PyTorch version, Transformers library version).
Experiment Setup Yes We use Adam with an outer-loop learning rate of 1 10 5, and an initial inner-loop learning of 1 10 2 which is learned in the outer loop. ... All scope classifier and counterfactual models are trained using Adam with a learning rate of 1 10 5.