Universal Backdoor Attacks

Authors: Benjamin Schneider, Nils Lukas, Florian Kerschbaum

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we empirically evaluate the effectiveness of our backdoor using different encoding methods. We extend this evaluation process to demonstrate the effectiveness of our backdoor when scaling the image classification task in both the number of samples and classes. By choosing which classes are poisoned, we measure the inter-class poison transferability of our poison. Lastly, we evaluate our Universal Backdoor Attack against a suite of popular defenses.
Researcher Affiliation Academia Benjamin Schneider, Nils Lukas, Florian Kerschbaum University of Waterloo ben.schneider.research@gmail.com, {nlukas, florian.kerschbaum}@uwaterloo.ca
Pseudocode Yes Algorithm 1 Universal Poisoning Algorithm
Open Source Code Yes Our source code is available at https://github.com/Ben-Schneider-code/Universal-Backdoor-Attacks.
Open Datasets Yes For our inital effectiveness evaluation, we use Image Net-1k with random crop and horizontal flipping (Russakovsky et al., 2014). We use three datasets, Image Net-2k, Image Net-4k, Image Net-6k, for our scaling experiments. These datasets comprise the largest 2 000, 4 000, and 6 000 classes from the Image Net-21K dataset (Deng et al., 2009).
Dataset Splits Yes The attacker s success rate on class y, denoted ASRy, is the proportion of validation images for which the attacker can craft a trigger that causes the image to be misclassified as y.
Hardware Specification No The paper mentions using ResNet-18/101 models and Hugging Face's CLIP, but does not specify the hardware (e.g., specific GPU or CPU models, memory) used for training or inference.
Software Dependencies No The paper mentions using a "pre-trained surrogate from Hugging Face" and "Hugging Face Transformers" but does not specify version numbers for any software dependencies.
Experiment Setup Yes We train our image classifiers using stochastic gradient descent (SGD) with a momentum of 0.9 and a weight decay of 0.0001. Models trained on Image Net-1K are trained for 90 epochs, while models trained on Image Net-2K, Image Net-4K, and Image Net-6K are trained for 60 epochs to adjust for the larger dataset size. The initial learning rate is set to 0.1 and is decreased by a factor of 10 every 30 epochs on Image Net-1K and every 20 epochs on the larger datasets. We use a batch size of 128 images for all training runs. Early stopping is applied to all training runs; we stop training when the model s accuracy is no longer improving or the model begins overfitting.