reproducibilityindex.ai

Twins: Revisiting the Design of Spatial Attention in Vision Transformers

Authors: Xiangxiang Chu, Zhi Tian, Yuqing Wang, Bo Zhang, Haibing Ren, Xiaolin Wei, Huaxia Xia, Chunhua Shen

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that both of our proposed architectures perform favorably against other state-of-the-art vision transformers with similar or even reduced computational complexity. We benchmark our proposed architectures on a number of visual tasks, ranging from image-level classiﬁcation to pixel-level semantic/instance segmentation and object detection.
Researcher Affiliation	Collaboration	1 Meituan Inc. 2 The University of Adelaide, Australia
Pseudocode	Yes	The Py Torch code of LSA is given in Algorithm 1 (in supplementary).
Open Source Code	Yes	Our code is available at: https://git.io/Twins.
Open Datasets	Yes	We first present the Image Net classiﬁcation results with our proposed models. We test on the ADE20K dataset [42], a challenging scene parsing task for semantic segmentation...We evaluate the performance of our method using two representative frameworks: Retina Net [46] and Mask RCNN [47]. Speciﬁcally, we report standard 1φ-schedule (12 epochs) detection results on the COCO 2017 dataset [48].
Dataset Splits	Yes	This dataset contains 20K images for training and 2K images for validation.
Hardware Specification	Yes	Throughput is tested on the batch size of 192 on a single V100 GPU.
Software Dependencies	No	The paper mentions software like PyTorch, Tensor RT, and MMDetection, but does not specify their version numbers for reproducibility.
Experiment Setup	Yes	All our models are trained for 300 epochs with a batch size of 1024 using the Adam W optimizer [37]. The learning rate is initialized to be 0.001 and decayed to zero within 300 epochs following the cosine strategy. We use a linear warm-up in the first five epochs and the same regularization setting as in [2].