Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions

Authors: Siqiao Mu, Diego Klabjan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we demonstrate the superior performance of our algorithm compared to existing methods, within a new experimental framework that more accurately reflects unlearning user data in practice. We empirically demonstrate the privacy-utility-efficiency tradeoff of our unlearning algorithm and its superior performance compared to other algorithms for nonconvex functions, within a challenging new framework in which the unlearned data is not independently and identically distributed (i.i.d.) to the training or test set. Finally, we empirically demonstrate the privacy-utility-efficiency tradeoff of our unlearning algorithm and its superior performance compared to other algorithms for nonconvex functions, within a challenging new framework in which the unlearned data is not independently and identically distributed (i.i.d.) to the training or test set.
Researcher Affiliation	Academia	Siqiao Mu Department of Engineering Sciences and Applied Mathematics Northwestern University Evanston, IL 60208 EMAIL Diego Klabjan Department of Industrial Engineering and Management Sciences Northwestern University Evanston, IL 60208 EMAIL
Pseudocode	Yes	Algorithm 1 A: R2D Learning Algorithm Algorithm 2 U: R2D Unlearning Algorithm Algorithm 3 Compute Checkpoint via Proximal Point Method
Open Source Code	Yes	Code is open-sourced at the following Git Hub link: https://github.com/siqiaomu/r2d.
Open Datasets	Yes	We consider two real-world datasets and neural network models with highly nonconvex loss functions. For small-scale experiments, we train a multilayer perceptron (MLP) with 3 hidden layers to perform classification on the e ICU dataset, a large multi-center intensive care unit (ICU) database consisting of tabular data on ICU admissions [36]. The e ICU dataset can be obtained after following the instructions at https://eicu-crd.mit.edu/gettingstarted/access/. For large-scale experiments, we consider a subset of the VGGFace2 dataset, which is composed of approximately 9, 000 celebrities and their face images from the internet [5]. The MAAD-Face annotations are available at https://github.com/pterhoer/MAAD-Face and the VGG-Face dataset is available at https://www.robots.ox.ac.uk/~vgg/data/vgg_face2/.
Dataset Splits	Yes	To test unlearning, we remove the data associated with a subset of the users (1%-2% of the data) and observe the impact on the original classification task. For Lacuna-100, we construct an OOD dataset using an additional 100 users from the VGGFace2 database. For e ICU, we construct the OOD dataset from data samples in the test set belonging to users not present in the training set. For both problem settings, we combine a subsample of the constructed OOD set with the unlearned set to form a 50-50 balanced training set for the attack model.
Hardware Specification	Yes	All experiments were run using Py Torch 2.5.1 and CUDA 12.4, either on an Intel(R) Xeon(R) Silver 4214 CPU (2.20GHz) with an NVIDIA Ge Force RTX 2080 (12 GB) or on an Intel(R) Xeon(R) Silver 4208 CPU (2.10GHz) with an NVIDIA RTX A6000 GPU (48 GB). Almost all experiments require less than 8 GB of GPU VRAM, except for running the HF algorithm on the Lacuna-100 dataset, which requires at least 29 GB of GPU VRAM to implement the scaled-down version.
Software Dependencies	Yes	All experiments were run using Py Torch 2.5.1 and CUDA 12.4
Experiment Setup	Yes	Table 4: R2D Experiment parameters for the e ICU and Lacuna-100 datasets. Experiment Parameter e ICU and MLP Lacuna-100 and Res Net-18 Size of training dataset n 94449 32000 Number of users 119282 100 Percent data unlearned 1% 2% Number of model parameters d 136386 11160258 Batch size 2048 512 L 0.2065 G 0.5946 η 0.01 0.01 Number of training epochs 78 270. Table 5: Experiment parameters of HF and CNS for the e ICU and Lacuna-100 datasets. Experiment Parameter e ICU Lacuna-100 Hessian-Free Unlearning Batch size 512 256 η0 0.1 0.1 Step size decay 0.995 0.995 Gradient norm clipping 5 5 Number of training epochs 15 25 Optimizer SGD SGD Constrained Newton Step Batch size 128 128 η 0.001 0.001 Weight decay 0.0005 0.0005 Parameter norm constraint R 10 21 Number of training epochs 30 30 Convex constant ̸ 200 2,000 Hessian scale H 50,000 50,000 Optimizer Adam Adam