Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Elastic Robust Unlearning of Specific Knowledge in Large Language Models
Authors: Yize Sui, Jing Ren, Wenjing Yang, Ruochun Jin, Liyang Xu, Xiyao Liu, J Wang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that ERU can improve the unlearning effectiveness significantly while maintaining high utility performance. Especially, on the WMDP-Bio benchmark, ERU shows a 9% improvement over the second-best method, and maintains 83% performance even under 1,000 sample fine-tuned retraining attacks, significantly better than the baseline method. |
| Researcher Affiliation | Academia | 1College of Computer Science and Technology, National University of Defense Technology 2School of Computer Science and Engineering, Central South University EMAIL EMAIL |
| Pseudocode | No | The paper describes the proposed Elastic Robust Unlearning (ERU) framework through theoretical formulations and detailed explanations of its components (elastic reward setting, refusal feature ablation, max-minimum optimization problem), but it does not include a formal pseudocode or algorithm block. |
| Open Source Code | No | Due to some constraints, our code is currently not available for open access. Nonetheless, our design is thoroughly discussed in E. |
| Open Datasets | Yes | Our experiments cover unlearning tasks across four benchmarks: RWKU [27], MUSE [28], TOFU [29], and WMDP [30]... Furthermore, RFA is defined as an inference-time intervention that sets the refusal feature at each layer as its average activation on harmless prompts: where Dharmful and Dharmless by sampling 500 instructions from the Adv Bench [38] and the Alpaca [39] datasets respectively. We fine-tune unlearned models on two datasets: (1) retain set; (2) Wiki Text, a collection of documents on Wikipedia that overlap least with dangerous knowledge... In the MUSE benchmark, three key metrics are proposed to evaluate the effectiveness of unlearning, namely Verb Mem (no verbatim memory), Know Mem (no knowledge memory) and Priv Leak (no privacy disclosure). These metrics, from the perspective of the data owner, evaluate that the model does not retain any information related to the forget set after unlearning. |
| Dataset Splits | Yes | For TOFU, we explore two unlearning scenarios, termed Forget05 and Forget10, representing forget set sizes of 5% and 10%, respectively... All articles are randomly into forget set Df, retain set Dr and holdout set Dh... The retain set consists of the remaining question-answering pairs from the fictional authors... fine-tuning with only 10 samples of the retain set... after fine-tuning with 1,000 samples. |
| Hardware Specification | Yes | Our experiment to evaluate the unlearning effectiveness, utility preservation and unlearning robustness were conducted with two A100 GPUs. |
| Software Dependencies | No | The paper mentions using 'Adam W' as an optimizer but does not specify version numbers for any software libraries, programming languages, or frameworks used (e.g., Python, PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | The training epochs of all unlearning methods is uniformly set to 3, and the learning rate is selected for each method in the range of 1e-8 to 1e-5 through grid search. Similarly, Our ERU conduct a grid search for β in the range [0.5, 1.0] and α in the range [5e-2, 0.2]. We use Adam W with 20 step warm-up during training... For MUSE-News, we train 10 epochs at a learning rate of 1e-5... For all baseline methods, we fixed the batch size to 32, set the learning rate to 1e-5, and fine-tune the target LLM by 10 epochs with the Adam W optimizer. Table 4: Hyperparameters used for refusal feature adversarial training. Table 5: Hyperparameters used for finetuning for relearning attack. |