Vision Transformers Are Robust Learners
Authors: Sayak Paul, Pin-Yu Chen2071-2081
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we study the robustness of the Vision Transformer (Vi T) against common corruptions and perturbations, distribution shifts, and natural adversarial examples. We use six different diverse Image Net datasets concerning robust classification to conduct a comprehensive performance comparison of Vi T models and SOTA convolutional neural networks (CNNs), Big-Transfer. Through a series of six systematically designed experiments, we then present analyses that provide both quantitative and qualitative indications to explain why Vi Ts are indeed more robust learners. |
| Researcher Affiliation | Industry | Sayak Paul,1* Pin-Yu Chen 2* 1 Carted 2 IBM Research sayak@carted.com, pin-yu.chen@ibm.com |
| Pseudocode | No | The paper describes procedures and methods in paragraph form but does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code for reproducing our experiments is available at https://git.io/J3VO0. |
| Open Datasets | Yes | We use six different diverse Image Net datasets concerning robust classification to conduct a comprehensive performance comparison of Vi T models and SOTA convolutional neural networks (CNNs), Big-Transfer. We consistently observe a better performance across all the variants of Vi T under different parameter regimes. We used 6 diverse Image Net datasets concerning different types of robustness evaluation. |
| Dataset Splits | No | The paper refers to the "Image Net-1k validation set" for sampling images for certain experiments but does not explicitly provide the specific percentages, sample counts, or a detailed methodology for creating the train/validation/test splits for reproduction. |
| Hardware Specification | No | The paper mentions "Google Cloud Platform credits" in the acknowledgements, but it does not specify any particular GPU models, CPU models, or detailed hardware specifications used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers for reproducibility (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | No | The paper discusses training strategies like using Adam or SGD and mentions dropout, but it does not provide concrete hyperparameter values (e.g., learning rates, batch sizes, number of epochs) for the main models' training setup. It only provides specific hyperparameters for adversarial attack generation (e.g., epsilon=0.002 for PGD, step size of 50 for Deep Fool) which are part of specific analyses rather than the general experimental setup. |