Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Revealing and Mitigating Over-Attention in Knowledge Editing

Authors: Pinzheng Wang, Zecheng Tang, Keyan Zhou, Juntao Li, Qiaoming Zhu, Min Zhang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on five frequently used strong LLMs demonstrate the effectiveness of our method, where SADR can significantly mitigate Specificity Failure in the predominant knowledge editing tasks. Table 1 illustrates a significant specificity failure when the edited subject occurs in the context, with the edited model incorrectly outputting the edited object in over 50% of test cases. 5 EXPERIMENTS
Researcher Affiliation Academia 1School of Computer Science and Technology, Soochow University 2Key Laboratory of Data Intelligence and Advanced Computing, Soochow University EMAIL EMAIL
Pseudocode Yes Algorithm 1: The MEMIT Algorithm
Open Source Code Yes Code, dataset and an interactive demo notebook: https://github.com/PinzhengWang322/Reveal_Attention_Drift.
Open Datasets Yes The dataset we use is a mixture of counterfact datasets from Meng et al. (2022) and Zhang et al. (2024a). Dataset Due to the limited availability of datasets that satisfy the required fields for our tasks, we combine COUNTERFACT (Meng et al., 2022) and Wiki Datacounterfact (Zhang et al., 2024a) with 1,683 factual statements as the testing data.
Dataset Splits Yes We test [20, 40, 80] optimization steps with restraining weights γ set at [5e 3, 1e 2, 4e 2, 8e 2] on the validation split. When testing the trade-off between generalization and specificity, we randomly sample 500 data points for evaluation.
Hardware Specification Yes All experiments are conducted on eight NVIDIA A100 (40GB) GPUs, with individual edits taking approximately 20 to 80 seconds on a single GPU.
Software Dependencies Yes implemented using Easy Edit 2. We build the human evaluation interface with the open-source python web library Django 3
Experiment Setup Yes The learning rate is 0.5, optimization steps are 20, and the KL factor ω is 0.0625 across various models. For GPT-J-6b, we edit layer 5, with optimization steps of 80 and a controlling weight γ = 1e 2 for the SADR method.