Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Keeping an Eye on LLM Unlearning: The Hidden Risk and Remedy

Authors: Jie Ren, Zhenwei Dai, Xianfeng Tang, Yue XING, Shenglai Zeng, Jingying Zeng, Qiankun Peng, Samarth Varshney, Suhang Wang, Qi He, Charu Aggarwal, Hui Liu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments validate the effectiveness of both SA and SU. Our code is available at github.com/renjie3/sa_su. [...] In this section, we first present the attack results of TOFU on vanilla unlearning methods across different models in Section 5.2. Then we show the results of improved robustness by SU in Section 5.3. In Section 5.4, we compare SU with two methods which are adapted from defenses against data poisoning attacks. Finally, we conduct the ablation studies in Section 5.5.
Researcher Affiliation Collaboration Jie Ren1, Zhenwei Dai2, Xianfeng Tang2, Yue Xing1, Shenglai Zeng1, Hui Liu2, Jingying Zeng2, Qiankun Peng2, Samarth Varshney2, Suhang Wang3, Qi He2, Charu C. Aggarwal4, Hui Liu1 1Michigan State University, 2Amazon, 3The Pennsylvania State University, 4IBM T. J. Watson Research Center
Pseudocode No The paper describes the methodology in prose and mathematical formulations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at github.com/renjie3/sa_su.
Open Datasets Yes Models, datasets, and unlearning methods We use LLa MA 3.1 (8B) [49] and Mistral v0.3 (7B) [50] for our experiments on two datasets: TOFU (QA-format) and RWKU (non-QA-format). [...] TOFU (dataset) MIT license https://github.com/locuslab/tofu [...] RWKU (dataset) Not provided https://github.com/jinzhuoran/RWKU/tree/main
Dataset Splits Yes TOFU is a synthetic dataset containing fake books and authors. LLMs are first fine-tuned on the full dataset, after which a subset is designated as forgetting data for unlearning, while the remaining data serves as retaining data for utility. [...] ptgt represents the proportion of forgetting data within the entire synthetic dataset. [...] We unlearn the identity of one celebrity, use a second celebrity s corpus as retaining data, and evaluate utility on a third celebrity.
Hardware Specification Yes All experiments are conducted on H100 GPUs.
Software Dependencies No The paper mentions using specific models (LLaMA 3.1, Mistral v0.3) and techniques (LoRA), and implementations based on other papers (Sim NPO), but does not provide specific version numbers for software libraries or environments (e.g., Python, PyTorch versions).
Experiment Setup Yes We set the transformation probability p in Eq. (4) to 0.33. [...] Lo RA is used with a rank of 8. [...] For RWKU, our implementation builds on [13] and [46], with fine-tuning details in Appendix C. [...] Table 6: Hyper parameters Dataset Model Lu Hyper-parameter value [...] Table 7: Hyper parameters Dataset Lu Model Hyper-parameter value [...] We set epoch as 10 for all the experiments following [13].