Autoregressive Perturbations for Data Poisoning

Authors: Pedro Sandoval-Segura, Vasu Singla, Jonas Geiping, Micah Goldblum, Tom Goldstein, David Jacobs

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the generality of AR poisoning by creating poisons across four datasets, including different image sizes and number of classes. Notably, we use the same set of AR processes to poison SVHN [22], STL-10 [6], and CIFAR-10 [20] since all of these datasets are 10 class classification problems. We demonstrate that despite the victim s choice of network architecture, AR poisons can degrade a network s accuracy on clean test data. We show that while strong data augmentations are an effective defense against all poisons we consider, AR poisoning is largely resistant.
Researcher Affiliation Academia Pedro Sandoval-Segura1 Vasu Singla1 Jonas Geiping1 Micah Goldblum2 Tom Goldstein1 David W. Jacobs1 1University of Maryland 2New York University
Pseudocode No The paper provides a high-level overview of the algorithm in Figure 2 and additional details in Appendix A.3.2, but it does not contain a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code No The paper states 'Full algorithm and details are in Appendix A.3' in the self-assessment checklist (3a), but there is no explicit statement within the main body or appendices that says 'We release our code at...' or provides a link to a code repository for the methodology described.
Open Datasets Yes We demonstrate the generality of AR poisoning by creating poisons across four datasets, including different image sizes and number of classes. Notably, we use the same set of AR processes to poison SVHN [22], STL-10 [6], and CIFAR-10 [20] since all of these datasets are 10 class classification problems.
Dataset Splits No The paper specifies the use of a 'test set' for evaluation, for example in Section 3.1: 'The goal is to perturb Dc into a poisoned set Dp such that when DNNs are trained on Dp, they perform poorly on test set Dt.' It also mentions the datasets used and training epochs, but it does not provide explicit details about train/validation/test dataset splits with percentages or sample counts.
Hardware Specification No The paper does not provide specific hardware details such as GPU models (e.g., NVIDIA A100, Tesla V100) or CPU models. It mentions '6 hours on 4 GPUs' in reference to a previous work ([10]) but not for its own experimental setup.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., Python version, PyTorch/TensorFlow version, specific library versions).
Experiment Setup Yes Experimental Settings. We train a number of Res Net-18 (RN-18) models on different poisons with cross-entropy loss for 100 epochs using a batch size of 128. For our optimizer, we use SGD with momentum of 0.9 and weight decay of 5 10 4. We use an initial learning rate of 0.1, which decays by a factor of 10 on epoch 50.