Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Ascent Fails to Forget
Authors: Ioannis Mavrothalassitis, Pol Puigdemont, Noam Levi, Volkan Cevher
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide empirical and theoretical evidence showing these methods often fail precisely due to this overlooked relationship. Our theoretical insights are corroborated by experiments on complex neural networks, demonstrating that these methods do not perform as expected in practice due to this unaddressed statistical interplay. In our main experiments, we examine two gradient-based unlearning approaches... We conduct these experiments using Res Net-9 models on Cifar-10... |
| Researcher Affiliation | Academia | Ioannis Mavrothalassitis Pol Puigdemont Noam Itzhak Levi Volkan Cevher LIONS, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland EMAIL |
| Pseudocode | No | The paper describes methods like Gradient Ascent and Gradient Descent/Ascent in text and mathematical equations, but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | We included the code along with instructions for its reproducibility. |
| Open Datasets | Yes | We conduct these experiments using Res Net-9 models on Cifar-10 [39], ... What results do you obtain when working with highly correlated datasets such as MNIST or Fashion MNIST? A: MNIST [45]... Fashion MNIST [46] |
| Dataset Splits | Yes | We also adopt nine forget sets directly from Georgiev et al. [38], which comprise both random subsets and semantically coherent subpopulations identified via principal-component analysis of the datamodel influence matrix. ... Forget set 1: 10 random samples. ... We measure KLo M values over each data-point in a set and report the 95th percentile in each group. |
| Hardware Specification | Yes | All experiments were conducted on a server equipped with eight NVIDIA A100-SXM4 GPUs, each with 80 GB of GPU memory. |
| Software Dependencies | No | Our implementation is based on Rinberg et al. [47] which follows the methodology in Georgiev et al. [38]. We pretrain Res Net-9 for 24 epochs using stochastic gradient descent (SGD) with an initial learning rate of 0.4, following a cyclic schedule that peaks at epoch 5. We employ a batch size of 512, momentum of 0.9, and a weight-decay coefficient of 5 10 4. The paper mentions methodology and optimizers but lacks specific software library versions (e.g., PyTorch 1.9, Python 3.8). |
| Experiment Setup | Yes | We pretrain Res Net-9 for 24 epochs using stochastic gradient descent (SGD) with an initial learning rate of 0.4, following a cyclic schedule that peaks at epoch 5. We employ a batch size of 512, momentum of 0.9, and a weight-decay coefficient of 5 10 4. ... Gradient Ascent: Optimized with SGD. Learning rates: {1 10 5, 5 10 5, 1 10 4, 5 10 4, 1 10 3, 1 10 2, 5 10 2}; epochs: {1, 3, 5, 7, 10}. |