Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Invariance Makes LLM Unlearning Resilient Even to Unanticipated Downstream Fine-Tuning

Authors: Changsheng Wang, Yihua Zhang, Jinghan Jia, Parikshit Ram, Dennis Wei, Yuguang Yao, Soumyadeep Pal, Nathalie Baracaldo, Sijia Liu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on the WMDP and MUSE benchmark, reveal that ILU significantly outperforms state-of-the-art unlearning methods, including negative preference optimization (NPO) and representation misdirection for unlearning (RMU). Notably, ILU achieves superior unlearning robustness across diverse downstream fine-tuning scenarios (e.g., math, paraphrase detection, and sentiment analysis) while preserving the fine-tuning performance. Our experiments and codes are available at https://github.com/OPTMLGroup/Unlearn-ILU.
Researcher Affiliation Collaboration 1Michigan State University 2 IBM Research. Correspondence to: Changsheng Wang <EMAIL>, Sijia Liu <EMAIL>.
Pseudocode No The paper describes mathematical formulations and algorithmic steps in prose, for example, under Section 4 'Promoting Invariance in LLM Unlearning', and in equations (1), (2), (3), (4), and (5). However, there are no explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code Yes Our experiments and codes are available at https://github.com/OPTMLGroup/Unlearn-ILU.
Open Datasets Yes Extensive experiments on the WMDP and MUSE benchmark, reveal that ILU significantly outperforms state-of-the-art unlearning methods, including negative preference optimization (NPO) and representation misdirection for unlearning (RMU). Notably, ILU achieves superior unlearning robustness across diverse downstream fine-tuning scenarios (e.g., math, paraphrase detection, and sentiment analysis) while preserving the fine-tuning performance. Our experiments and codes are available at https://github.com/OPTMLGroup/Unlearn-ILU.
Dataset Splits Yes We use the forget set provided in the WMDP (Li et al., 2024) benchmark, which contains a large collection of biology-related articles. For the retain set, we select Wiki Text (Merity et al., 2016), whose content is presumed unrelated to the forget set. Our baseline model is Zephyr-7B-beta, as specified in the WMDP benchmark.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. It only mentions the models used (e.g., Zephyr-7B-beta, LLaMA-3-8B-Instruct).
Software Dependencies No The paper mentions specific models like "Zephyr-7B-beta" and "LLaMA-3-8B-Instruct" but does not list any specific software dependencies (libraries, frameworks, or solvers) with their version numbers.
Experiment Setup Yes For unlearning, we first employ the NPO method with 2000 optimization steps, gradient accumulation every 4 steps, and a context length of 1024 tokens for each data chunk. The learning rate is chosen via a grid search in [10 6, 10 5], while the parameter γ appearing before the retain loss is selected from [1, 2.5]. We choose the final unlearned model as the one that preserves performance closest to the original Zephyr-7B-beta. We also employ the RMU method, using a batch size of 4 and sampling 800 total data instances, each with 512 tokens per data chunk. The learning rate is tuned within [10 5, 10 3], and the parameter α appearing before the retain loss is searched in [1, 10]. ILU integrates invariance regularization into the loss function. We tune the key parameter λ in [0.1, 2.0]. We set the batch size to 48 for each unlearning step when using a single dataset on both NPO-based and RMU-based ILU. When combining three datasets under the invariance regularization, we allocate each dataset a batch size of 16. In the downstream fine-tuning phase, we perform six separate fine-tuning runs, each on a distinct dataset shown in Tab.1. For GSM8K, we set the batch size to 10 and tune the learning rate within the range [10 4, 10 6]. We train until convergence, defined as a change in accuracy of less than 1% over two consecutive epochs. For each of the remaining datasets, we adopt a batch size of 64 with a learning rate in [10 4, 10 6], following the same convergence criterion.