Intriguing Properties of Vision Transformers

Authors: Muhammad Muzammal Naseer, Kanchana Ranasinghe, Salman H Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We systematically study this question via an extensive set of experiments encompassing three Vi T families and provide comparisons with a high-performing convolutional neural network (CNN).
Researcher Affiliation Collaboration Australian National University, ?Mohamed bin Zayed University of AI, +Stony Brook University, Monash University, Linköping University, University of California, Merced, Yonsei University, r Google Research
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Code: https://git.io/Js15X.
Open Datasets Yes We consider visual recognition task with models pretrained on Image Net [2]. The effect of occlusion is studied on the validation set (50k images).
Dataset Splits Yes We consider visual recognition task with models pretrained on Image Net [2]. The effect of occlusion is studied on the validation set (50k images).
Hardware Specification Yes All the models are trained on 4 V100 GPUs.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, CUDA x.x).
Experiment Setup Yes Thus, we train models on SIN without applying any augmentation, label smoothing or mixup.