Twins: Revisiting the Design of Spatial Attention in Vision Transformers
Authors: Xiangxiang Chu, Zhi Tian, Yuqing Wang, Bo Zhang, Haibing Ren, Xiaolin Wei, Huaxia Xia, Chunhua Shen
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that both of our proposed architectures perform favorably against other state-of-the-art vision transformers with similar or even reduced computational complexity. We benchmark our proposed architectures on a number of visual tasks, ranging from image-level classification to pixel-level semantic/instance segmentation and object detection. |
| Researcher Affiliation | Collaboration | 1 Meituan Inc. 2 The University of Adelaide, Australia |
| Pseudocode | Yes | The Py Torch code of LSA is given in Algorithm 1 (in supplementary). |
| Open Source Code | Yes | Our code is available at: https://git.io/Twins. |
| Open Datasets | Yes | We first present the Image Net classification results with our proposed models. We test on the ADE20K dataset [42], a challenging scene parsing task for semantic segmentation...We evaluate the performance of our method using two representative frameworks: Retina Net [46] and Mask RCNN [47]. Specifically, we report standard 1φ-schedule (12 epochs) detection results on the COCO 2017 dataset [48]. |
| Dataset Splits | Yes | This dataset contains 20K images for training and 2K images for validation. |
| Hardware Specification | Yes | Throughput is tested on the batch size of 192 on a single V100 GPU. |
| Software Dependencies | No | The paper mentions software like PyTorch, Tensor RT, and MMDetection, but does not specify their version numbers for reproducibility. |
| Experiment Setup | Yes | All our models are trained for 300 epochs with a batch size of 1024 using the Adam W optimizer [37]. The learning rate is initialized to be 0.001 and decayed to zero within 300 epochs following the cosine strategy. We use a linear warm-up in the first five epochs and the same regularization setting as in [2]. |