Understanding Zero-shot Adversarial Robustness for Large-Scale Models

Authors: Chengzhi Mao, Scott Geng, Junfeng Yang, Xin Wang, Carl Vondrick

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct an extensive evaluation on 15 zero-shot image datasets, offering a holistic study of the zero-shot adversarial robustness problem. Our best performing model with the Te Co A loss can improve adversarial robustness over CLIP by an average of 31% across the datasets.
Researcher Affiliation Collaboration Columbia University1, Microsoft Research2
Pseudocode Yes We describe our algorithm in Algorithm 1. Algorithm 1 Standard Adversarial Training (Adv.). We describe our algorithm in Algorithm 2. Algorithm 2 Contrastive Adversarial Training Loss (Co Adv.). We introduce our algorithm in Algorithm 3. Algorithm 3 Contrastive Adversarial Training over Images (Img Co Adv.). We describe the Te Co A training algorithm in Algorithm 4. Algorithm 4 Te Co A Training. We describe the algorithm below in Algorithm 5. Algorithm 5 Te Co A Training on Unlabeled Data.
Open Source Code Yes Overall, our approach significantly improves the zero-shot adversarial robustness over CLIP, seeing an average improvement of 31 points over Image Net and 15 zero-shot datasets. Our code and model is available at github.com/cvlab-columbia/ZSRobust4Foundation Model.
Open Datasets Yes We evaluate the zero-shot adversarial robustness conferred by Te Co A trained on Image Net (Deng et al., 2009a) and report the performance of the models on the Image Net test set as well as 15 zero-shot test datasets, covering a diverse range of recognition tasks. Specifically, we include CIFAR10, CIFAR100 (Krizhevsky et al., 2009), STL10 (Coates et al., 2011), Caltech101 (Fei-Fei et al., 2004), and Caltech256 (Griffin et al., 2007) for generic classification; Oxford Pets (Parkhi et al., 2012), Stanford Cars (Krause et al., 2013), Food101 (Bossard et al., 2014), Flowers102 (Nilsback & Zisserman, 2008), and FGVCAircraft (Maji et al., 2013) for fine-grained classification; SUN397 (Xiao et al., 2010) for scene recognition; and DTD (Cimpoi et al., 2014) for texture recognition. Finally, we include three datasets with domain-specialized tasks, Patch Camelyon (PCAM, lymph node tumor detection) (Veeling et al., 2018), Hateful Memes (hatespeech detection) (Kiela et al., 2020), and Euro SAT (satellite image classification) (Helber et al., 2017).
Dataset Splits No The paper states training on 'Image Net' and evaluating on 'Image Net test set' and '15 zero-shot test datasets'. It does not explicitly specify a validation set split or how a validation set from ImageNet was used for hyperparameter tuning during their training process.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments, such as CPU or GPU models, memory, or cloud computing resources.
Software Dependencies No The paper mentions using an 'SGD optimizer with momentum 0.9' but does not specify any software libraries or frameworks (e.g., PyTorch, TensorFlow) with their version numbers.
Experiment Setup Yes We optimize the model using an SGD optimizer with momentum 0.9. We train for 10 epochs. For finetuning, we use a learning rate of 1e 5. For visual prompt tuning, we use a learning rate of 40 and a batch size of 256. For prompt tuning on the entire Image Net dataset, we use token-level prompt with size 200, while for subsets of Image Net (1K, 5K, and 50K images), we use token-level prompt with size 5 and a smaller batch size of 64. Unless specified, during adversarial training, we generate Linf = 1/255 bounded attacks using a 2-step PGD attack (Madry et al., 2018) with step size α = 1/255. We test the robustness of our model using 100 steps of PGD attack, with step size α = 1/255.