Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Distillation Robustifies Unlearning

Authors: Bruce W Lee, Addie Foote, Alex Infanger, Leni Shor, Harish Kamath, Jacob Goldman-Wetzler, Bryce Woodworth, Alex Cloud, Alexander Turner

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We benchmark UNDO against a variety of unlearning methods on language and arithmetic tasks, extending the Pareto frontier of retain performance versus forget unlearning robustness. We also show competitive performance and robustification of unlearning on the more realistic Weapons of Mass Destruction Proxy (WMDP) benchmark.
Researcher Affiliation	Academia	Bruce W. Lee1,2 Addie Foote1 Alex Infanger1 Leni Shor1,3 Harish Kamath1 Jacob Goldman-Wetzler1,4 Bryce Woodworth1 Alex Cloud Alexander Matt Turner 1ML Alignment & Theory Scholars 2University of Pennsylvania 3Massachusetts Institute of Technology 4Brown University
Pseudocode	Yes	The pseudocode is as follows: def do_corruption(model, noise_alpha, noise_beta=0.1, seed=42): # Loop through all parameters and add random noise assert 0 <= noise_alpha <= 1 assert 0 <= noise_beta for param in model.parameters(): if param.requires_grad: # Initialize corruption tensor corruption = torch.zeros_like(param.data) # Generate appropriate noise based on parameter dimensionality if len(param.data.shape) == 2: # For weight matrices (2D tensors), use Xavier init noise = torch.nn.init.xavier_uniform_( torch.empty_like(param.data) ) elif len(param.data.shape) == 1: # For bias vectors (1D tensors), use zeros noise = torch.zeros_like(param.data) else: raise RuntimeError( f"Unsupported parameter shape: {param.data.shape}" ) # Scale the noise by beta corruption = noise_beta * noise # Apply weighted combination param.data=(1 noise_alpha)param.data+ noise_alphacorruption # Move model to appropriate device model.to(device)
Open Source Code	Yes	We share our code implementation publicly through Git Hub.
Open Datasets	Yes	Our process starts from collecting datasets to facilitate our language and arithmetic unlearning experiments. For language experiments, we utilize two sources: (1) Hugging Face FW/fineweb-23 Korean subset, providing non-English language examples, and (2) Hugging Face FW/fineweb-edu4, containing English language examples with high-quality content [59, 58]. We sample 10 million rows from each source.
Dataset Splits	Yes	Both language and arithmetic datasets have separate test sets. For language data, we allocated 1,000,000 tokens for validation, which consists of 500,000 tokens for the retain domain (English) and 500,000 tokens for the forget domain (Korean). For arithmetic data, we allocated 1,000,000 questions for validation, which consists of 500,000 questions for the retain domain (addition and subtraction) and 500,000 questions for the forget domain (multiplication and division).
Hardware Specification	Yes	We run all experiments on servers with multiple H200 or A100 GPUs.
Software Dependencies	No	The paper only mentions software names without version numbers (e.g., "Adam W optimizer", "Google Gemma-2-2b tokenizer", "open-unlearning framework") and does not specify version numbers for any key software components or libraries.
Experiment Setup	Yes	Table 2: Pretraining Hyperparameters, Table 3: Oracle Matching Hyperparameters, Table 4: Relearning Hyperparameters, Table 5: Unlearning Hyperparameters, Table 6: Distillation Hyperparameters. These tables explicitly list hyperparameters like learning rate, batch size, epochs, maximum steps, weight decay, LR schedule, and sequence length.