Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

LLM Unlearning via Neural Activation Redirection

Authors: William Shen, Xinchi Qiu, Meghdad Kurmanji, Alexandru-Andrei Iacob, Lorenzo Sani, Yihong Chen, Nicola Cancedda, Nicholas D. Lane

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct experiments to evaluate LUNAR s performance, focusing on the following research questions:
Researcher Affiliation Collaboration 1University of Cambridge 2Meta 3UCL Centre for Artificial Intelligence 4FAIR at Meta
Pseudocode Yes A Algorithm Algorithm 1 LUNAR: Unlearning via Neural Activation Recalibration
Open Source Code No We will make our code base available upon paper acceptance.
Open Datasets Yes For the former, we use TOFU [31] and PISTOL [41] datasets; for the latter, we use the common knowledge dataset provided by [31]. To redirect activations, we use either harmful prompts dataset [3]
Dataset Splits Yes We optimize LLUNAR (Eq. 6) using all forget data points and an equal number of randomly sampled retain data points, a setting that we find to be sufficient empirically.
Hardware Specification Yes We have conducted all our experiment with single Nvidia H100 GPU.
Software Dependencies No The paper does not explicitly list specific software dependencies with version numbers. It mentions that code will be made available upon acceptance but does not detail the software environment or library versions used in the experiments.
Experiment Setup Yes Table 5: Learning rates of unlearning methods across settings and base models. All baseline unlearning methods exhibit high sensitivity to learning rate tuning, necessitating extensive effort to avoid minimal unlearning or catastrophic collapse of the retain model utility. Each method requires individualized tuning for every model and forget dataset to achieve optimal performance specifically, learning rates were tuned to minimize the ROUGE1 score on the forget dataset, while ensuring that retain model utility measured by the ROUGE1 score on the retain dataset remains above circa 0.8.