Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models

Authors: Taha Entesari, Arman Hatami, Rinat Khaziev, Anil Ramakrishna, Mahyar Fazlyab

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluations on the TOFU and MUSE benchmarks across diverse LLM architectures demonstrate that our approach consistently matches or exceeds stateof-the-art baselines, effectively removing targeted information while preserving downstream utility.
Researcher Affiliation Collaboration 1 Johns Hopkins University, 2 Amazon
Pseudocode Yes Algorithm 1 Primal-Dual Solver with Warm Starting (Problem (2)
Open Source Code Yes Moreover, our algorithm is implemented in this repository and made public at https://github.com/locuslab/open-unlearning.
Open Datasets Yes We evaluated our unlearning methodology on two established benchmarks: TOFU and MUSE [29, 39, 12].
Dataset Splits Yes In the main experiments, we choose to forget the subset Forget10 and defer Forget05 and Forget01 to the Supplementary Material. The MUSE benchmark focuses on unlearning in two real-world contexts: Books and News.
Hardware Specification Yes For the experiments using the LLAMA 3.2 1B/3B models, we use a single A100 GPU with 40GB of memory. For all other models, we use 8 A100 80 GB GPUs within a p4de.xlarge AWS EC2 instance.
Software Dependencies No A paged_adamw_32bit optimizer with a learning rate of 10 5, Using torch with precision bfloat16. The paper mentions 'torch' but does not provide a specific version number for this or any other software library.
Experiment Setup Yes A paged_adamw_32bit optimizer with a learning rate of 10 5, Using torch with precision bfloat16. The rest of the fine-tuning hyperparameters are reflected in Table 4.