Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Twins: Revisiting the Design of Spatial Attention in Vision Transformers
Authors: Xiangxiang Chu, Zhi Tian, Yuqing Wang, Bo Zhang, Haibing Ren, Xiaolin Wei, Huaxia Xia, Chunhua Shen
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that both of our proposed architectures perform favorably against other state-of-the-art vision transformers with similar or even reduced computational complexity. We benchmark our proposed architectures on a number of visual tasks, ranging from image-level classification to pixel-level semantic/instance segmentation and object detection. |
| Researcher Affiliation | Collaboration | 1 Meituan Inc. 2 The University of Adelaide, Australia |
| Pseudocode | Yes | The Py Torch code of LSA is given in Algorithm 1 (in supplementary). |
| Open Source Code | Yes | Our code is available at: https://git.io/Twins. |
| Open Datasets | Yes | We first present the Image Net classification results with our proposed models. We test on the ADE20K dataset [42], a challenging scene parsing task for semantic segmentation...We evaluate the performance of our method using two representative frameworks: Retina Net [46] and Mask RCNN [47]. Specifically, we report standard 1φ-schedule (12 epochs) detection results on the COCO 2017 dataset [48]. |
| Dataset Splits | Yes | This dataset contains 20K images for training and 2K images for validation. |
| Hardware Specification | Yes | Throughput is tested on the batch size of 192 on a single V100 GPU. |
| Software Dependencies | No | The paper mentions software like PyTorch, Tensor RT, and MMDetection, but does not specify their version numbers for reproducibility. |
| Experiment Setup | Yes | All our models are trained for 300 epochs with a batch size of 1024 using the Adam W optimizer [37]. The learning rate is initialized to be 0.001 and decayed to zero within 300 epochs following the cosine strategy. We use a linear warm-up in the first five epochs and the same regularization setting as in [2]. |