reproducibilityindex.ai

Exploring the Limits of Model-Targeted Indiscriminate Data Poisoning Attacks

Authors: Yiwei Lu, Gautam Kamath, Yaoliang Yu

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform extensive experiments to confirm our theoretical findings, test the predictability of our transition threshold, and significantly improve existing indiscriminate data poisoning baselines over a range of datasets and models.
Researcher Affiliation	Collaboration	1School of Computer Science, University of Waterloo, Canada 2Vector Institute.
Pseudocode	Yes	Algorithm 1: Gradient Canceling(GC) Attack
Open Source Code	Yes	Our code is available at https://github.com/watml/plim.
Open Datasets	Yes	Dataset: We consider image classification on MNIST (Deng 2012) (60k training and 10k test images), CIFAR-10 (Krizhevsky 2009) (50k training and 10k test images), and Tiny Image Net (Chrabaszcz et al. 2017) (100k training, 10k validation and 10k test images).
Dataset Splits	Yes	For the first two datasets, we further split the training data into 70% training set and 30% validation set, respectively.
Hardware Specification	Yes	Hardware and package: experiments were run on a cluster with T4 and P100 GPUs.
Software Dependencies	No	The platform we use is Py Torch (Paszke et al. 2019).
Experiment Setup	Yes	Optimizer, learning rate scheduler and hyperparameters: we use SGD with momentum for optimization and the cosine learning rate scheduler (Loshchilov and Hutter 2017) for the Gradient Canceling algorithm. We set the initial learning rate as 0.5 and run 1000 epochs across every experiment.