Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Machine Unlearning Fails to Remove Data Poisoning Attacks
Authors: Martin Pawelczyk, Jimmy Di, Yiwei Lu, Gautam Kamath, Ayush Sekhari, Seth Neel
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally demonstrate that, while existing unlearning methods have been demonstrated to be effective in a number of settings, they fail to remove the effects of data poisoning across a variety of types of poisoning attacks (indiscriminate, targeted, and a newly-introduced Gaussian poisoning attack) and models (image classifiers and LLMs); even when granted a relatively large compute budget. |
| Researcher Affiliation | Collaboration | 1Harvard University, 2University of Waterloo, 3Vector Institute, 4MIT, 5Google |
| Pseudocode | Yes | Algorithm 1 Gaussian Unlearning Score (GUS) Input: Model θ to be evaluated. Algorithm 2 Gaussian Data Poisoning to Evaluate Unlearning Input: Unlearning algorithm Unlearn-Alg to be evaluated. Algorithm 3 Gradient Matching to generate poisons (Geiping et al., 2021) Algorithm 4 Gradient Canceling (GC) Attack (Lu et al., 2023) |
| Open Source Code | Yes | We release the code for our Gaussian data poisoning method at: https://github.com/Martin Pawel/ Open Unlearn. |
| Open Datasets | Yes | For the language task, we consider the IMDb dataset (Maas et al., 2011). ... For the vision task, we use the CIFAR-10 dataset (Krizhevsky et al., 2010). |
| Dataset Splits | No | The paper discusses using |
| Hardware Specification | No | The paper mentions 'compute budget' and 'computational constraints' but does not specify any particular hardware models like GPUs or CPUs used for the experiments. |
| Software Dependencies | No | The paper mentions models like Resnet-18 and GPT-2, and optimizers like SGD and Adam, but does not specify version numbers for any software libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | Models. For the vision tasks, we train a standard Resnet-18 model for 100 epochs. We conduct the language experiments on GPT-2 (355M parameters) LLMs (Radford et al., 2019). ... We train these models for 10 epochs on the poisoned IMDb training dataset. ... GD using the following hyperparameters: SGD optimizer with a lr = 1e 3, momentum = 0.9, and weight_decay = 5e 4. ... NGD using the same hyperparameters as GD with the additional Gaussian noise variance σ2 {1e 07,1e 06}. ... GA using the similar hyperparameters as GD but with a smaller lr = [5e 6,1e 5]. ... EUk ... with a learning rate of 1e-3, 1e-4, 1e-5 and the number of layers to retrain K = 3. ... CFk, we experiment with a learning rate of {1e 3,1e 4,1e 5} and the number of layers to retrain set to K = 3. ... Compute budget. ... up to 10% of the compute used in initial training (or fine-tuning) of the model. |