History Matters: Temporal Knowledge Editing in Large Language Model

Authors: Xunjian Yin, Jin Jiang, Liming Yang, Xiaojun Wan

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we introduce the task of Temporal Knowledge Editing (TKE) and establish a benchmark ATOKE (Assessment of Temp Oral Knowledge Editing) to evaluate current model editing methods. We find that while existing model editing methods are effective at making models remember new knowledge, the edited model catastrophically forgets historical knowledge. To address this gap, we propose a simple and general framework termed Multi-Editing with Time Objective (METO) for enhancing existing editing models, which edits both historical and new knowledge concurrently and optimizes the model s prediction for the time of each fact. Our assessments demonstrate that while ATOKE is still difficult, METO maintains the effectiveness of learning new knowledge and meanwhile substantially improves the performance of edited models on utilizing historical knowledge. ... We conduct experiments with several popular editing methods and analyze the results. The performance of existing knowledge editing methods on our benchmarks is shown in Table 2.
Researcher Affiliation Academia Xunjian Yin1,2, Jin Jiang1, Liming Yang3, Xiaojun Wan1,2 1Wangxuan Institute of Computer Technology, Peking University 2Center for Data Science, Peking University 3School of Law, Tsinghua University
Pseudocode No The paper describes the METO framework conceptually and with a diagram (Figure 3), but it does not include a structured pseudocode or algorithm block.
Open Source Code Yes Our benchmark has been released to the community to facilitate future research 1. [Footnote]: 1https://github.com/Arvid-pku/ATOKE
Open Datasets Yes Our dataset is based on YAGO3.0.34 (Mahdisoltani, Biega, and Suchanek 2015), a knowledge base comprising fact triples associated with millions of entities extracted from Wikipedia. ... Our benchmark has been released to the community to facilitate future research 1.
Dataset Splits No The paper uses the GPT-J model, which has its own training and validation, but it does not explicitly define training, validation, or testing splits for the ATOKE datasets created for the experiments within this paper.
Hardware Specification No The paper mentions using 'GPT-J (6B)' as the base LLM, but it does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies No The paper mentions using GPT-J (6B) and Chat GPT (gpt-3.5-turbo), which are language models. It also refers to YAGO3.0.34 for data. However, it does not list any specific software dependencies like programming languages, libraries, or frameworks with their version numbers.
Experiment Setup No The paper describes the general approach for applying existing knowledge editing methods (CFT, MEND, ROME, MEMIT) and the METO framework. However, it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or other detailed system-level training settings used for their experiments.