Hidden Poison: Machine Unlearning Enables Camouflaged Poisoning Attacks
Authors: Jimmy Di, Jack Douglas, Jayadev Acharya, Gautam Kamath, Ayush Sekhari
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the efficacy of our attack when unlearning is performed via retraining from scratch, the idealized setting of machine unlearning which other efficient methods attempt to emulate, as well as against the approximate unlearning approach of Graves et al. [2021]. |
| Researcher Affiliation | Academia | Jimmy Z. Di University of Waterloo jimmy.di@uwaterloo.ca Jack Douglas University of Waterloo jack.douglas@uwaterloo.ca Jayadev Acharya Cornell University acharya@cornell.edu Gautam Kamath University of Waterloo, Vector Institute g@csail.mit.edu Ayush Sekhari Massachusetts Institute of Technology sekhari@mit.edu |
| Pseudocode | Yes | Algorithm 1 Gradient Matching to generate camouflages; Algorithm 2 Gradient Matching to generate poisons [Geiping et al., 2021] |
| Open Source Code | No | We plan to make our code public with the final version of the paper. |
| Open Datasets | Yes | We perform extensive evaluations on the (multiclass) CIFAR-10 classification task with various popular large-scale neural networks models including VGG-11, VGG-16 [Simonyan and Zisserman, 2015], Res Net-18, Res Net-34, Res Net-50 [He et al., 2016], and Mobile Net V2 [Sandler et al., 2018], trained using cross-entropy loss. (...) We evaluate the efficacy of our attack vector on the challenging multiclass classification problem on the Imagenette and Imagewoof datasets [Howard, 2019]. |
| Dataset Splits | No | The paper frequently mentions 'validation accuracy' (e.g., 'Each trained model had validation accuracy of around 81.63% on the clean dataset Scl'), but it does not explicitly state the specific split percentages or sample counts used for the validation dataset, nor does it refer to a standard validation split. It only explicitly mentions the training and test split sizes. |
| Hardware Specification | No | The paper mentions training on 'a single CPU' or 'a single GPU' and '200GB of memory' for some experiments, but it does not specify any particular CPU or GPU models (e.g., Intel i7, NVIDIA A100) or other specific hardware configurations. It mentions computational resources from the 'Digital Research Alliance of Canada' but not specific machine types. |
| Software Dependencies | No | The paper mentions using 'Scikit-learn [Pedregosa et al., 2011]' and 'Py Torch [Paszke et al., 2019]' but does not provide specific version numbers for these software libraries or other dependencies. While the citations indicate the version of the papers, they do not explicitly state the software versions used for the experiments. |
| Experiment Setup | Yes | Each model is trained with cross-entropy loss ℓ(f(x,θ),y) = − log(Pr(y = f(x,θ))) on a single GPU using Py Torch [Paszke et al., 2019], and using mini-batch SGD with weight decay 5e-4, momentum 0.9, learning rate 0.01, batch size 100, and 40 epochs over the training dataset. Each poison/camouflage generation took about 1.5 hours. (...) We start with a learning rate of 0.01, and exponentially decay it with γ = 0.9 after every epoch, for a total of 50 epochs over the training dataset. |