History Matters: Temporal Knowledge Editing in Large Language Model
Authors: Xunjian Yin, Jin Jiang, Liming Yang, Xiaojun Wan
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we introduce the task of Temporal Knowledge Editing (TKE) and establish a benchmark ATOKE (Assessment of Temp Oral Knowledge Editing) to evaluate current model editing methods. We find that while existing model editing methods are effective at making models remember new knowledge, the edited model catastrophically forgets historical knowledge. To address this gap, we propose a simple and general framework termed Multi-Editing with Time Objective (METO) for enhancing existing editing models, which edits both historical and new knowledge concurrently and optimizes the model s prediction for the time of each fact. Our assessments demonstrate that while ATOKE is still difficult, METO maintains the effectiveness of learning new knowledge and meanwhile substantially improves the performance of edited models on utilizing historical knowledge. ... We conduct experiments with several popular editing methods and analyze the results. The performance of existing knowledge editing methods on our benchmarks is shown in Table 2. |
| Researcher Affiliation | Academia | Xunjian Yin1,2, Jin Jiang1, Liming Yang3, Xiaojun Wan1,2 1Wangxuan Institute of Computer Technology, Peking University 2Center for Data Science, Peking University 3School of Law, Tsinghua University |
| Pseudocode | No | The paper describes the METO framework conceptually and with a diagram (Figure 3), but it does not include a structured pseudocode or algorithm block. |
| Open Source Code | Yes | Our benchmark has been released to the community to facilitate future research 1. [Footnote]: 1https://github.com/Arvid-pku/ATOKE |
| Open Datasets | Yes | Our dataset is based on YAGO3.0.34 (Mahdisoltani, Biega, and Suchanek 2015), a knowledge base comprising fact triples associated with millions of entities extracted from Wikipedia. ... Our benchmark has been released to the community to facilitate future research 1. |
| Dataset Splits | No | The paper uses the GPT-J model, which has its own training and validation, but it does not explicitly define training, validation, or testing splits for the ATOKE datasets created for the experiments within this paper. |
| Hardware Specification | No | The paper mentions using 'GPT-J (6B)' as the base LLM, but it does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions using GPT-J (6B) and Chat GPT (gpt-3.5-turbo), which are language models. It also refers to YAGO3.0.34 for data. However, it does not list any specific software dependencies like programming languages, libraries, or frameworks with their version numbers. |
| Experiment Setup | No | The paper describes the general approach for applying existing knowledge editing methods (CFT, MEND, ROME, MEMIT) and the METO framework. However, it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or other detailed system-level training settings used for their experiments. |