Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Reinforced Lifelong Editing for Language Models

Authors: Zherui Li, Houcheng Jiang, Hao Chen, Baolong Bi, Zhenhong Zhou, Fei Sun, Junfeng Fang, Xiang Wang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive empirical evaluation across several LLMs demonstrates that RLEdit outperforms existing methods in lifelong editing with superior effectiveness and efficiency, achieving a 59.24% improvement while requiring only 2.11% of the time compared to most approaches. Our code is available at: https://github.com/zhrli324/RLEdit. ... We conduct extensive experiments to evaluate both the effectiveness and efficiency of our approach. Additionally, we perform ablation studies to analyze the contribution of each component in RLEdit, which can be found in Appendix B.1.
Researcher Affiliation Academia 1Beijing University of Posts and Telecommunications 2University of Science and Technology of China 3Institute of Computing Technology, Chinese Academy of Sciences 4National University of Singapore.
Pseudocode Yes The pseudo-code is provided in Algorithm 1. ... Algorithm 1 RLEdit Hypernetwork Training ... The corresponding pseudocode for RLEdit s editing algorithms is presented in Algorithm 2.
Open Source Code Yes Our code is available at: https://github.com/zhrli324/RLEdit.
Open Datasets Yes We evaluate RLEdit on three widely-used datasets: Zs RE (Levy et al., 2017), FEVER (Thorne et al., 2018), and Counter Fact (Meng et al., 2022). Following previous evaluation standards (Mitchell et al., 2022a; Meng et al., 2022; 2023)
Dataset Splits Yes We randomly sampled 8,000 knowledge samples from Zs RE and FEVER respectively, performing edits over 400 batches with 20 knowledge samples per batch (denoted as a 400 ร— 20 configuration throughout this paper). ... For locate-then-edit methods, we use the version from MEMIT; for hypernetwork-based methods, we use the version from MEND, where Zs RE is divided into training and test sets for hypernetwork training and editing performance evaluation respectively.
Hardware Specification Yes Most experiments were conducted on a single NVIDIA A100 (80GB) GPU.
Software Dependencies No The paper does not provide specific version numbers for software dependencies such as programming languages or libraries.
Experiment Setup Yes For the hyperparameters in RLEdit training and editing, we set the memory backtracking decay factor ยต to 0.95, the backtracking depth k to 10, the regularization coefficient ฮท to 1e-4 and the discount factor ฮณ to 1 in the total reward formula. Additionally, the initial learning rate was set to 1e-6, while the meta-learning rate was set to 1e-5. The specific hyperparameter configurations for different models and datasets are shown in Table 3