When Adversarial Training Meets Vision Transformers: Recipes from Training to Architecture
Authors: Yichuan Mo, Dongxian Wu, Yifei Wang, Yiwen Guo, Yisen Wang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we provide the first and comprehensive study on the adversarial training recipe of Vi Ts via extensive evaluation of various training techniques across benchmark datasets. |
| Researcher Affiliation | Academia | 1 Key Lab. of Machine Perception (Mo E), School of Intelligence Science and Technology, Peking University 2 The University of Tokyo 3 School of Mathematical Sciences, Peking University 4 Independent Researcher 5 Institute for Artificial Intelligence, Peking University |
| Pseudocode | Yes | The details of ARD and PRM based adversarial training for Vi Ts are summarized in Appendix C. |
| Open Source Code | Yes | Our code is available at https://github.com/mo666666/When-Adversarial-Training-Meets-Vision-Transformers. |
| Open Datasets | Yes | Here, we use datasets of CIFAR-10 [36] and Imagenette [37] (a subset of 10 classes from Image Net-1K). |
| Dataset Splits | Yes | Note that the latest version of Imagenette (imagenette-v22) reshuffles the sampled subset of Image Net-1K and then splits the training and validation set. |
| Hardware Specification | No | The paper states it provides information on compute resources in Section 3, but Section 3 only mentions training settings and datasets, not specific hardware details like GPU/CPU models or types of clusters. For example, it does not specify 'type of GPUs' as indicated in the checklist. |
| Software Dependencies | No | The paper mentions 'pytorch-image-models' in a footnote, suggesting the use of PyTorch, but it does not specify version numbers for any software, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | All models (unless otherwise specified) are pre-trained on Image Net-1K and are adversarially trained for 40 epochs using SGD with weight decay 1e-4, and an initial learning rate 0.1 that is divided by 10 at the 36-th and 38-th epoch. Simple data augmentations such as random crop with padding and random horizontal flip are applied. During adversarial training, we use PGD-10 with step size 2/255 to craft adversarial examples. |