Vision Transformers Are Robust Learners

Authors: Sayak Paul, Pin-Yu Chen2071-2081

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we study the robustness of the Vision Transformer (Vi T) against common corruptions and perturbations, distribution shifts, and natural adversarial examples. We use six different diverse Image Net datasets concerning robust classification to conduct a comprehensive performance comparison of Vi T models and SOTA convolutional neural networks (CNNs), Big-Transfer. Through a series of six systematically designed experiments, we then present analyses that provide both quantitative and qualitative indications to explain why Vi Ts are indeed more robust learners.
Researcher Affiliation Industry Sayak Paul,1* Pin-Yu Chen 2* 1 Carted 2 IBM Research sayak@carted.com, pin-yu.chen@ibm.com
Pseudocode No The paper describes procedures and methods in paragraph form but does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Code for reproducing our experiments is available at https://git.io/J3VO0.
Open Datasets Yes We use six different diverse Image Net datasets concerning robust classification to conduct a comprehensive performance comparison of Vi T models and SOTA convolutional neural networks (CNNs), Big-Transfer. We consistently observe a better performance across all the variants of Vi T under different parameter regimes. We used 6 diverse Image Net datasets concerning different types of robustness evaluation.
Dataset Splits No The paper refers to the "Image Net-1k validation set" for sampling images for certain experiments but does not explicitly provide the specific percentages, sample counts, or a detailed methodology for creating the train/validation/test splits for reproduction.
Hardware Specification No The paper mentions "Google Cloud Platform credits" in the acknowledgements, but it does not specify any particular GPU models, CPU models, or detailed hardware specifications used for running the experiments.
Software Dependencies No The paper does not provide specific software names with version numbers for reproducibility (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup No The paper discusses training strategies like using Adam or SGD and mentions dropout, but it does not provide concrete hyperparameter values (e.g., learning rates, batch sizes, number of epochs) for the main models' training setup. It only provides specific hyperparameters for adversarial attack generation (e.g., epsilon=0.002 for PGD, step size of 50 for Deep Fool) which are part of specific analyses rather than the general experimental setup.