Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
RULE: Reinforcement UnLEarning Achieves Forget-retain Pareto Optimality
Authors: Chenlong Zhang, Zhuoran Jin, Hongbang Yuan, Jiaheng Wei, Tong Zhou, Kang Liu, Jun Zhao, Yubo Chen
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide both theoretical and empirical evidence demonstrating the effectiveness of RULE in achieving targeted unlearning without compromising model utility. Experimental results show that, with only 12% forget set and 8% synthesized boundary data, RULE outperforms existing baselines by up to 17.5% forget quality and 16.3% naturalness response while maintaining general utility, achieving forget retain Pareto optimality. Remarkably, we further observe that RULE improves the naturalness of model outputs, enhances training efficiency, and exhibits strong generalization ability, generalizing refusal behavior to semantically related but unseen queries. ... We conduct extensive experiments to evaluate RULE s performance in unlearning quality, response naturalness, and utility. |
| Researcher Affiliation | Academia | 1 The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, 2School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China, 3The Hong Kong University of Science and Technology (Guangzhou) EMAIL EMAIL EMAIL |
| Pseudocode | Yes | D.1 RULE Implementation We show the implementation of RULE here. Algorithm 1: RULE: Reinforcement Unlearning with Two-Stage Optimization |
| Open Source Code | Yes | Codes are available at: https://github.com/chenlong-clock/RULE-Unlearn |
| Open Datasets | Yes | We evaluate on the RWKU [22]benchmark with llama3-8b-instruct [14] and llama3.18b-instruct[24]. RWKU is a real-world knowledge unlearning benchmark designed to test models ability on specific knowledge. The dataset provides three types of knowledge probe questions for the forget set: FB, QA, and AA, used for unlearning effectiveness. For utility preservation, it includes two types of questions on a neighbor set to assess the impact of perturbation: FB and QA. ... We also conduct experiments on MUSE[45], which is a comprehensive unlearning benchmark that requires models to unlearn either news articles or book series. |
| Dataset Splits | Yes | RULE performs online-sampling-based reinforcement learning using only 12% forget set and 8% synthesized boundary data. ... Re BOGRPO uses only 12.1% of Df and 8.03% of Dr, in contrast to most baselines that require 100% of both. ... Table 8: Data usage statistics. The table shows the number of used queries for both Type-I and Type-II. |
| Hardware Specification | Yes | For RWKU, the RS (Rejection Steering) stage takes 0.033 hours (approximately 2 minutes) per target on 4s A100 GPUs. The Re BO (Refusal Boundary Optimization) phase further refines the model in just 0.467 hours per target using 4 A100 GPUs. |
| Software Dependencies | No | The paper does not explicitly state specific version numbers for general software dependencies like Python, PyTorch, or TensorFlow. It mentions specific LLM checkpoints like 'llama3-8b-instruct' and 'GPT-4o-mini' as models used, but these are not considered ancillary software dependencies with versions in the context of the prompt's requirements. |
| Experiment Setup | Yes | For all baselines and variants, we follow the same parameter settings as used in the original RWKU [22] paper to ensure fair comparison. Models are fine-tuned using the llama3-8b-instruct checkpoint under bf16 precision. We use cosine learning rate schedules and full-parameter tuning. Table 10 summarizes the key hyperparameters across different training stages. ... We use KL coefficient of 1e 2 and a learning rate of 2e 6 with 20 optimization steps for RWKU and 187 steps for MUSE. Table 10: Key training hyperparameters across methods. Method Learning Rate Batch Size Epochs GA 6e-8 4 3.0 NPO 2e-6 16 3.0 Sim NPO 1e-6 16 3.0 RSRWKU 4e-7 8 2.0 RSMUSE 4e-7 8 1.0 |