Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Hard to Forget: Poisoning Attacks on Certified Machine Unlearning
Authors: Neil G. Marchant, Benjamin I. P. Rubinstein, Scott Alfeld7691-7700
AAAI 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper we demonstrate how an attacker can exploit this oversight, highlighting a novel attack surface introduced by machine unlearning. We consider an attacker aiming to increase the computational cost of data removal. We derive and empirically investigate a poisoning attack on certified machine unlearning where strategically designed training data triggers complete retraining when removed. and 4 Experiments We investigate the impact of our attack on the computational cost of unlearning in a variety of simulated settings. |
| Researcher Affiliation | Academia | Neil G. Marchant,1 Benjamin I. P. Rubinstein,1 Scott Alfeld2 1 School of Computing and Information Systems, University of Melbourne, Parkville, Australia 2 Department of Computer Science, Amherst College, Amherst, MA, USA EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Learning algorithm for certified removal and Algorithm 2: Unlearning algorithm for certified removal |
| Open Source Code | Yes | Code is available at https://github.com/ngmarchant/attackunlearning. |
| Open Datasets | Yes | We consider MNIST (Le Cun et al. 1998) and Fashion MNIST (Xiao, Rasul, and Vollgraf 2017), both of which contain d = 28 28 single-channel images from ten classes. We also generate a smaller binary classification dataset from MNIST we call Binary-MNIST which contains classes 3 and 8. Beyond the image domain, we consider human activity recognition (HAR) data (Anguita et al. 2013). |
| Dataset Splits | No | The paper mentions 'train/test splits' but does not explicitly specify validation splits or their proportions. |
| Hardware Specification | No | The paper mentions 'HPC-GPGPU Facility' but does not provide specific hardware details like GPU/CPU models or memory specifications. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers, such as Python libraries or frameworks. |
| Experiment Setup | No | Further details about the experiments, including hardware, parameter settings poisoning ratios, and the number of trials in each experiment are provided in Appendix B of our extended paper (Marchant, Rubinstein, and Alfeld 2021). |