reproducibilityindex.ai

Poisoning and Backdooring Contrastive Learning

Authors: Nicholas Carlini, Andreas Terzis

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that this practice makes backdoor and poisoning attacks a signiﬁcant threat. By poisoning just 0.01% of a dataset (e.g., just 300 images of the 3 million-example Conceptual Captions dataset), we can cause the model to misclassify test images by overlaying a small patch. Targeted poisoning attacks, whereby the model misclassiﬁes a particular test input with an adversarially-desired label, are even easier requiring control of 0.0001% of the dataset (e.g., just three out of the 3 million images). Our attacks call into question whether training on noisy and uncurated Internet scrapes is desirable.
Researcher Affiliation	Industry	Nicholas Carlini Google Andreas Terzis Google
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	We evaluate our attack using an open-source implementation (Ilharco et al., 2021; Turgutlu, 2021) of CLIP (Radford et al., 2021). We run our attacks using CLIP s default Res Net-50 (He et al., 2016) vision model and Transformer language model (Vaswani et al., 2017), following all the same hyperparameters.
Open Datasets	Yes	We demonstrate the efﬁcacy of our attack on two datasets: the 3 million example Conceptual Captions dataset (Sharma et al., 2018), and the 15 million example YFCC Thomee et al. (2016) subset.
Dataset Splits	Yes	In each experiment we choose a random target image x from the conceptual captions validation set, and then choose a random target class from the Image Net test set.
Hardware Specification	Yes	All our experiments use a batch size 1024, training across 8 V100 GPUs for 30 epochs using a learning rate of .0002 training with Momentum SGD and weight decay of 0.02.
Software Dependencies	No	The paper mentions using 'open-source implementation (Ilharco et al., 2021; Turgutlu, 2021) of CLIP' and 'CLIP s default Res Net-50 (He et al., 2016) vision model and Transformer language model (Vaswani et al., 2017)' but does not specify version numbers for any software dependencies like PyTorch, TensorFlow, or specific CLIP library versions.
Experiment Setup	Yes	All our experiments use a batch size 1024, training across 8 V100 GPUs for 30 epochs using a learning rate of .0002 training with Momentum SGD and weight decay of 0.02.