Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DualOptim: Enhancing Efficacy and Stability in Machine Unlearning with Dual Optimizers

Authors: Xuyang Zhong, Haochen Luo, Chen Liu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conduct extensive experiments to evaluate our method for different applications, including image classiﬁcation, image generation, and large language models. The results demonstrate that our method can enhance the efﬁcacy and stability of multiple MU methods, achieving new stateof-the-art performance across diverse scenarios. We conduct ablation studies for further analysis.
Researcher Affiliation	Academia	Xuyang Zhong Department of Computer Science City University of Hong Kong EMAIL Haochen Luo Department of Computer Science City University of Hong Kong EMAIL Chen Liu Department of Computer Science City University of Hong Kong EMAIL
Pseudocode	Yes	Algorithm 1 Machine Unlearning with Shared Optimizer / Dual Optimizers
Open Source Code	Yes	Codes are available at https://github.com/City UMLO/Dual Optim.
Open Datasets	Yes	For CIFAR-10, CIFAR-100, and SVHN using Res Net-18, all baselines use the SGD optimizer with momentum of 0.9, weight decay of 5 10 4, and batch size of 128 if not speciﬁed. For Tiny Image Net, Swin-T, the models are initialized from torchvision weight pre-trained on Image Net. All baselines use the Adam optimizer with a weight decay of 5 10 4 and batch size of 128 if not speciﬁed. ... For Phi-1.5 and LLa MA 2, we utilize pre-trained models on the TOFU dataset [2] and conduct evaluations accordingly.
Dataset Splits	Yes	Experiments are conducted on (a) 10% random subset of CIFAR-10 using Res Net-18 and (b) 10% random subset of Tiny Image Net using Swin-T. ... We consider three levels of unlearning tasks: to forget 1%, 5%, and 10% of the constructed data.
Hardware Specification	Yes	For training LLa MA 2 with Dual Optim, we use two NVIDIA H20 GPUs with 96GB of memory each. For Phi-1.5, we use two NVIDIA RTX 6000 Ada GPUs with 48GB of memory each.
Software Dependencies	No	The paper mentions optimizers like SGD, Adam [10, 11], Lion [55], and Muon [56], but does not provide specific version numbers for these or for core programming languages/frameworks like Python or PyTorch.
Experiment Setup	Yes	Summaries of the hyperparameters for each method on each dataset are shown in Table 7 10. Note that the unspeciﬁed hyperparameters are the same as the default ones reported in their original papers. Table 7: Summary of hyperparameters for each method on unlearning 10% random subset of CIFAR10. η is short for learning rate.