Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Joint Knowledge Editing for Information Enrichment and Probability Promotion

Authors: Wenhang Shi, Yiren Chen, Shuqing Bian, Xinyi Zhang, Zhe Zhao, Pengfei Hu, Wei Lu, Xiaoyong Du

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We rigorously evaluate JEEP by editing up to thousands of facts on various models, i.e., GPT-J (6B) and LLa MA (7B), and addressing diverse editing objectives, i.e., adding factual and counterfactual knowledge. In all tested scenarios, JEEP achieves best performances, validating the effectiveness of the revealings of our probe approach and the designs of our editing method. We conduct extensive experiments involving edits ranging from 1 to 10,000 across various model architectures, including GPT-J (6B) and LLa MA (7B) (Wang and Komatsuzaki 2021; Touvron et al. 2023), and datasets such as zs RE and Multi-COUNTERFACT (Levy et al. 2017; Meng et al. 2022b). In all tested scenarios, JEEP consistently delivers the optimal performances, confirming the effectiveness of our methodological designs and validating our probe approach to identify critical editing stages.
Researcher Affiliation	Collaboration	Wenhang Shi1, Yiren Chen2, Shuqing Bian3, Xinyi Zhang1, Zhe Zhao3, Pengfei Hu3, Wei Lu1, Xiaoyong Du1 1 Renmin University of China 2Peking University 3Tencent EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes methods narratively and using mathematical equations (Eq. 1-12) but does not include a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	Code https://github.com/Eric8932/JEEP
Open Datasets	Yes	We extract 10,000 real-world factual pairs (x, y ) from zs RE (Levy et al. 2017), a question-answering dataset. We edit 10,000 samples from the Multi-COUNTERFACT dataset (Meng et al. 2022b)
Dataset Splits	No	The paper mentions using 10,000 samples from zsRE and Multi-COUNTERFACT for editing. However, it does not explicitly provide information on how these samples are split into training, validation, or test sets for the purpose of developing or evaluating the editing method itself, beyond stating the number of samples used for editing.
Hardware Specification	No	The paper mentions conducting experiments on 'GPT-J (6B)' and 'LLa MA (7B)' models but does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for these experiments.
Software Dependencies	No	The paper does not explicitly list any software dependencies with specific version numbers (e.g., Python, PyTorch, CUDA versions) that would be needed to replicate the experiment.
Experiment Setup	No	The paper mentions coefficients for loss terms (β, β', α) and for adaptive updates (γ, γ') as part of the method description. However, specific concrete values for these hyperparameters, or other typical experimental setup details like learning rates, batch sizes, or number of epochs, are not provided in the main text. It mentions 'Implementation details are in Appendix D', suggesting these details are not in the main body of the paper.