Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Invariance Makes LLM Unlearning Resilient Even to Unanticipated Downstream Fine-Tuning
Authors: Changsheng Wang, Yihua Zhang, Jinghan Jia, Parikshit Ram, Dennis Wei, Yuguang Yao, Soumyadeep Pal, Nathalie Baracaldo, Sijia Liu
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the WMDP and MUSE benchmark, reveal that ILU significantly outperforms state-of-the-art unlearning methods, including negative preference optimization (NPO) and representation misdirection for unlearning (RMU). Notably, ILU achieves superior unlearning robustness across diverse downstream fine-tuning scenarios (e.g., math, paraphrase detection, and sentiment analysis) while preserving the fine-tuning performance. Our experiments and codes are available at https://github.com/OPTMLGroup/Unlearn-ILU. |
| Researcher Affiliation | Collaboration | 1Michigan State University 2 IBM Research. Correspondence to: Changsheng Wang <EMAIL>, Sijia Liu <EMAIL>. |
| Pseudocode | No | The paper describes mathematical formulations and algorithmic steps in prose, for example, under Section 4 'Promoting Invariance in LLM Unlearning', and in equations (1), (2), (3), (4), and (5). However, there are no explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures. |
| Open Source Code | Yes | Our experiments and codes are available at https://github.com/OPTMLGroup/Unlearn-ILU. |
| Open Datasets | Yes | Extensive experiments on the WMDP and MUSE benchmark, reveal that ILU significantly outperforms state-of-the-art unlearning methods, including negative preference optimization (NPO) and representation misdirection for unlearning (RMU). Notably, ILU achieves superior unlearning robustness across diverse downstream fine-tuning scenarios (e.g., math, paraphrase detection, and sentiment analysis) while preserving the fine-tuning performance. Our experiments and codes are available at https://github.com/OPTMLGroup/Unlearn-ILU. |
| Dataset Splits | Yes | We use the forget set provided in the WMDP (Li et al., 2024) benchmark, which contains a large collection of biology-related articles. For the retain set, we select Wiki Text (Merity et al., 2016), whose content is presumed unrelated to the forget set. Our baseline model is Zephyr-7B-beta, as specified in the WMDP benchmark. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. It only mentions the models used (e.g., Zephyr-7B-beta, LLaMA-3-8B-Instruct). |
| Software Dependencies | No | The paper mentions specific models like "Zephyr-7B-beta" and "LLaMA-3-8B-Instruct" but does not list any specific software dependencies (libraries, frameworks, or solvers) with their version numbers. |
| Experiment Setup | Yes | For unlearning, we first employ the NPO method with 2000 optimization steps, gradient accumulation every 4 steps, and a context length of 1024 tokens for each data chunk. The learning rate is chosen via a grid search in [10 6, 10 5], while the parameter γ appearing before the retain loss is selected from [1, 2.5]. We choose the final unlearned model as the one that preserves performance closest to the original Zephyr-7B-beta. We also employ the RMU method, using a batch size of 4 and sampling 800 total data instances, each with 512 tokens per data chunk. The learning rate is tuned within [10 5, 10 3], and the parameter α appearing before the retain loss is searched in [1, 10]. ILU integrates invariance regularization into the loss function. We tune the key parameter λ in [0.1, 2.0]. We set the batch size to 48 for each unlearning step when using a single dataset on both NPO-based and RMU-based ILU. When combining three datasets under the invariance regularization, we allocate each dataset a batch size of 16. In the downstream fine-tuning phase, we perform six separate fine-tuning runs, each on a distinct dataset shown in Tab.1. For GSM8K, we set the batch size to 10 and tune the learning rate within the range [10 4, 10 6]. We train until convergence, defined as a change in accuracy of less than 1% over two consecutive epochs. For each of the remaining datasets, we adopt a batch size of 64 with a learning rate in [10 4, 10 6], following the same convergence criterion. |