Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Ascent Fails to Forget

Authors: Ioannis Mavrothalassitis, Pol Puigdemont, Noam Levi, Volkan Cevher

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide empirical and theoretical evidence showing these methods often fail precisely due to this overlooked relationship. Our theoretical insights are corroborated by experiments on complex neural networks, demonstrating that these methods do not perform as expected in practice due to this unaddressed statistical interplay. In our main experiments, we examine two gradient-based unlearning approaches... We conduct these experiments using Res Net-9 models on Cifar-10...
Researcher Affiliation	Academia	Ioannis Mavrothalassitis Pol Puigdemont Noam Itzhak Levi Volkan Cevher LIONS, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland EMAIL
Pseudocode	No	The paper describes methods like Gradient Ascent and Gradient Descent/Ascent in text and mathematical equations, but does not include explicit pseudocode or algorithm blocks.
Open Source Code	Yes	We included the code along with instructions for its reproducibility.
Open Datasets	Yes	We conduct these experiments using Res Net-9 models on Cifar-10 [39], ... What results do you obtain when working with highly correlated datasets such as MNIST or Fashion MNIST? A: MNIST [45]... Fashion MNIST [46]
Dataset Splits	Yes	We also adopt nine forget sets directly from Georgiev et al. [38], which comprise both random subsets and semantically coherent subpopulations identified via principal-component analysis of the datamodel influence matrix. ... Forget set 1: 10 random samples. ... We measure KLo M values over each data-point in a set and report the 95th percentile in each group.
Hardware Specification	Yes	All experiments were conducted on a server equipped with eight NVIDIA A100-SXM4 GPUs, each with 80 GB of GPU memory.
Software Dependencies	No	Our implementation is based on Rinberg et al. [47] which follows the methodology in Georgiev et al. [38]. We pretrain Res Net-9 for 24 epochs using stochastic gradient descent (SGD) with an initial learning rate of 0.4, following a cyclic schedule that peaks at epoch 5. We employ a batch size of 512, momentum of 0.9, and a weight-decay coefficient of 5 10 4. The paper mentions methodology and optimizers but lacks specific software library versions (e.g., PyTorch 1.9, Python 3.8).
Experiment Setup	Yes	We pretrain Res Net-9 for 24 epochs using stochastic gradient descent (SGD) with an initial learning rate of 0.4, following a cyclic schedule that peaks at epoch 5. We employ a batch size of 512, momentum of 0.9, and a weight-decay coefficient of 5 10 4. ... Gradient Ascent: Optimized with SGD. Learning rates: {1 10 5, 5 10 5, 1 10 4, 5 10 4, 1 10 3, 1 10 2, 5 10 2}; epochs: {1, 3, 5, 7, 10}.