Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Efficient Utility-Preserving Machine Unlearning with Implicit Gradient Surgery

Authors: Shiji Zhou, Tianbai Yu, Zhi Zhang, Heng Chang, Xiao Zhou, Dong Wu, Han Zhao

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, our extensive experiments show that the proposed algorithm achieves better tradeoff results than existing baselines. We conducted comprehensive experiments on tasks including image classification and image generation. Both numerical and visual results demonstrated significant improvements, effectively proving that our method can fully optimize the unlearning objective while maintaining utility to the greatest extent.
Researcher Affiliation	Collaboration	1Institute of Artificial Intelligence, Beihang University 2 Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, Beihang University 3University of Illinois at Urbana-Champaign 4University of Amsterdam 5Tsinghua University 6Yan Tron Technology Co.Ltd
Pseudocode	Yes	Algorithm 1 Efficient Utility-Preserving Machine Unlearning (EUPMU)
Open Source Code	Yes	Codes are available at https://github.com/anseryuer/ EUPMU-Efficient-Utility-Preserving-Machine-Unlearning.
Open Datasets	Yes	We focus on DDPM [31] and SD [55] models to prevent the generation of specific object classes, and the experiments are conducted on CIFAR-10 and Imagenette [33], respectively. We also consider concept-wise forgetting in SD to erase instance & style concepts and NSFW (not safe for work) content. All numerical results are the mean value over 5 independent trials. All experiments are carried out on two A100 GPUs.
Dataset Splits	Yes	For classification, the MIA setup, train/val/test splits, and metric computation follow Sal Un. ... Tables 7, 8, 9, 10, 11, 12, 13, 14, and 15 report random-data forgetting results. Numbers in parentheses are the absolute gap to the retrain model for that metric. Across CIFAR-10, CIFAR-100, and Tiny-Image Net-200, the Avg.gap column quantifies the mean distance between each method and an ideal retrain; smaller is better.
Hardware Specification	Yes	All experiments are carried out on two A100 GPUs.
Software Dependencies	No	The paper mentions using the Adam optimizer, but no specific versions for general software libraries like Python, PyTorch, or CUDA are provided.
Experiment Setup	Yes	For EUPMU, it is trained for 1,000 iterations with a learning rate of 1e-4, an α value of 1e-3, and a batch size of 128. The sampling settings include 1,000 timesteps and a conditional scaling of 2.0. β is searched within [1e-4, 1e-2] and ε is searched within [1e+1, 5e+3]. For SD, the forgetting settings are as follows: For EUPMU, it is trained with the Adam optimizer for 5 epochs at a learning rate of 1e-5. The α value is set to 0.01, and the batch size is 8. The β is set to 1e-4 and ε is set to 1e+3. The sampling settings use DDIM with 100 timesteps and a conditional scaling of 7.5.