When Adversarial Training Meets Vision Transformers: Recipes from Training to Architecture

Authors: Yichuan Mo, Dongxian Wu, Yifei Wang, Yiwen Guo, Yisen Wang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we provide the first and comprehensive study on the adversarial training recipe of Vi Ts via extensive evaluation of various training techniques across benchmark datasets.
Researcher Affiliation Academia 1 Key Lab. of Machine Perception (Mo E), School of Intelligence Science and Technology, Peking University 2 The University of Tokyo 3 School of Mathematical Sciences, Peking University 4 Independent Researcher 5 Institute for Artificial Intelligence, Peking University
Pseudocode Yes The details of ARD and PRM based adversarial training for Vi Ts are summarized in Appendix C.
Open Source Code Yes Our code is available at https://github.com/mo666666/When-Adversarial-Training-Meets-Vision-Transformers.
Open Datasets Yes Here, we use datasets of CIFAR-10 [36] and Imagenette [37] (a subset of 10 classes from Image Net-1K).
Dataset Splits Yes Note that the latest version of Imagenette (imagenette-v22) reshuffles the sampled subset of Image Net-1K and then splits the training and validation set.
Hardware Specification No The paper states it provides information on compute resources in Section 3, but Section 3 only mentions training settings and datasets, not specific hardware details like GPU/CPU models or types of clusters. For example, it does not specify 'type of GPUs' as indicated in the checklist.
Software Dependencies No The paper mentions 'pytorch-image-models' in a footnote, suggesting the use of PyTorch, but it does not specify version numbers for any software, libraries, or frameworks used in the experiments.
Experiment Setup Yes All models (unless otherwise specified) are pre-trained on Image Net-1K and are adversarially trained for 40 epochs using SGD with weight decay 1e-4, and an initial learning rate 0.1 that is divided by 10 at the 36-th and 38-th epoch. Simple data augmentations such as random crop with padding and random horizontal flip are applied. During adversarial training, we use PGD-10 with step size 2/255 to craft adversarial examples.