Augmented Shortcuts for Vision Transformers
Authors: Yehui Tang, Kai Han, Chang Xu, An Xiao, Yiping Deng, Chao Xu, Yunhe Wang
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments conducted on benchmark datasets demonstrate the effectiveness of the proposed method, which brings about 1% accuracy increase of the state-of-the-art visual transformers without obviously increasing their parameters and FLOPs. |
| Researcher Affiliation | Collaboration | 1Key Lab of Machine Perception (MOE), Dept. of Machine Intelligence, Peking University. 2Huawei Noah s Ark Lab. 4Central Software Institution, Huawei Technologies. 3School of Computer Science, Faculty of Engineering, University of Sydney. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The model architecture can be found in the Mind Spore model zoo 4. 4https://gitee.com/mindspore/models/tree/master/research/cv/augvit |
| Open Datasets | Yes | Image Net (ILSVRC-2012) dataset [6] contains 1.3 M training images and 50k validation images from 1000 classes, which is a widely used image classification benchmark. |
| Dataset Splits | Yes | Image Net (ILSVRC-2012) dataset [6] contains 1.3 M training images and 50k validation images from 1000 classes, which is a widely used image classification benchmark. |
| Hardware Specification | Yes | All experiments are conducted with Py Torch [28] and Mind Spore 3 on NVIDIA V100 GPUs. |
| Software Dependencies | No | The paper mentions using 'Py Torch' and 'Mind Spore' but does not specify their version numbers. |
| Experiment Setup | Yes | Specifically, the model is trained with Adam W [24] optimizer for 300 epochs with batchsize 1024. The learning rate is initialized to 10 3 and then decayed with the cosine schedule. Label smoothing [30], Drop Path [21] and repeated augmentation [18] are also implemented following Dei T [34]. The data augmentation strategy contains Rand-Augment [5], Mixup [42] and Cut Mix [41]. |