Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
On Effects of Steering Latent Representation for Large Language Model Unlearning
Authors: Huu-Tien Dang, Tin Pham, Hoang Thanh-Tung, Naoya Inoue
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that Adaptive RMU significantly improves the unlearning performance compared to prior art while incurring no additional computational cost. Experimental results show that Adaptive RMU achieves higher drop-in-accuracy for forget knowledge, maintaining high performance on general knowledge, and enables effective unlearning for most layers without incurring additional computational overhead. |
| Researcher Affiliation | Academia | Dang Huu-Tien1, Tin Pham1, Hoang Thanh-Tung2, and Naoya Inoue1,3 1Japan Advanced Institute of Science and Technology 2VNU University of Engineering and Technology, Vietnam 3RIKEN |
| Pseudocode | Yes | Algorithm 1: Adaptive RMU pseudocode |
| Open Source Code | Yes | Our code is available at https://github.com/RebelsNLU-jaist/llm-unlearning. |
| Open Datasets | Yes | We use WMDP-Biology and WMDP-Cyber forget datasets as Dforget and Wikitext (Merity et al. 2022) as Dretain for unlearning the LLM. Unlearned models are evaluated on WMDP Q&A datasets and MMLU (Hendrycks et al. 2021). |
| Dataset Splits | No | The paper mentions using 'WMDP-Biology and WMDP-Cyber forget datasets as Dforget and Wikitext (Merity et al. 2022) as Dretain' for unlearning and 'WMDP Q&A datasets and MMLU (Hendrycks et al. 2021)' for evaluation. However, it does not specify explicit percentages, counts, or a methodology for splitting these datasets into training, validation, or test sets within the scope of this research. |
| Hardware Specification | Yes | Two NVIDIA A40s with 90GB GPU were used to run the experiments. |
| Software Dependencies | No | The paper mentions 'Adam W' as an optimizer but does not specify any software libraries or frameworks with version numbers (e.g., PyTorch, TensorFlow, Python version) that were used for implementation. |
| Experiment Setup | Yes | Models were fine-tuned using Adam W (Loshchilov and Hutter 2019) with learning rate η = 5e 5, batch-size of 4, max sequence len of 512 for WMDP-Biology and 768 for WMDP-Cyber, with T = 500 gradient update steps. The retain weight α = 1200. For the baseline RMU, we follow the previous work and let c = 6.5. We grid search for unlearn layer l from the third to the last layer. For the Adaptive RMU, we grid search for the scaling factor β {2, 3, 5, 10}. We report the performances of Adaptive RMU models with β = 5. We update three layers parameters {l, l 1, l 2} of the model. |