Augmented Shortcuts for Vision Transformers

Authors: Yehui Tang, Kai Han, Chang Xu, An Xiao, Yiping Deng, Chao Xu, Yunhe Wang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments conducted on benchmark datasets demonstrate the effectiveness of the proposed method, which brings about 1% accuracy increase of the state-of-the-art visual transformers without obviously increasing their parameters and FLOPs.
Researcher Affiliation Collaboration 1Key Lab of Machine Perception (MOE), Dept. of Machine Intelligence, Peking University. 2Huawei Noah s Ark Lab. 4Central Software Institution, Huawei Technologies. 3School of Computer Science, Faculty of Engineering, University of Sydney.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The model architecture can be found in the Mind Spore model zoo 4. 4https://gitee.com/mindspore/models/tree/master/research/cv/augvit
Open Datasets Yes Image Net (ILSVRC-2012) dataset [6] contains 1.3 M training images and 50k validation images from 1000 classes, which is a widely used image classification benchmark.
Dataset Splits Yes Image Net (ILSVRC-2012) dataset [6] contains 1.3 M training images and 50k validation images from 1000 classes, which is a widely used image classification benchmark.
Hardware Specification Yes All experiments are conducted with Py Torch [28] and Mind Spore 3 on NVIDIA V100 GPUs.
Software Dependencies No The paper mentions using 'Py Torch' and 'Mind Spore' but does not specify their version numbers.
Experiment Setup Yes Specifically, the model is trained with Adam W [24] optimizer for 300 epochs with batchsize 1024. The learning rate is initialized to 10 3 and then decayed with the cosine schedule. Label smoothing [30], Drop Path [21] and repeated augmentation [18] are also implemented following Dei T [34]. The data augmentation strategy contains Rand-Augment [5], Mixup [42] and Cut Mix [41].