ResT: An Efficient Transformer for Visual Recognition

Authors: Qinglong Zhang, Yu-Bin Yang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We comprehensively validate Res T on image classification and downstream tasks. Experimental results show that the proposed Res T can outperform the recently state-of-the-art backbones by a large margin, demonstrating the potential of Res T as strong backbones.
Researcher Affiliation Academia Qing-Long Zhang, Yu-Bin Yang State Key Laboratory for Novel Software Technology Nanjing University, Nanjing 21023, China wofmanaf@smail.nju.edu.cn, yangyubin@nju.edu.cn
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The code and models will be made publicly available at https://github.com/wofmanaf/Res T.
Open Datasets Yes We comprehensively validate the effectiveness of the proposed Res T on the commonly used benchmarks, including image classification on Image Net-1k and downstream tasks, such as object detection, and instance segmentation on MS COCO2017.
Dataset Splits Yes Image Net-1k, which contains 1.28M training images and 50k validation images from 1,000 classes." and "MS COCO2017, which contains 118k training, 5k validation, and 20k test-dev images.
Hardware Specification Yes Throughput (images / s) is measured on a single V100 GPU, following [26]." and "A batch size of 2048 (using 8 GPUs with 256 images per GPU), an initial learning rate of 5e-4, a weight decay of 0.05, and gradient clipping with a max norm of 5 are used.
Software Dependencies No The paper mentions software components and techniques such as Adam W optimizer, GELU activation, Instance Normalization, Batch Normalization, ReLU activation, and Layer Normalization, but it does not provide specific version numbers for any of these software dependencies or frameworks (e.g., PyTorch, TensorFlow).
Experiment Setup Yes For image classification, we employ the Adam W [20] optimizer for 300 epochs using a cosine decay learning rate scheduler and 5 epochs of linear warm-up. A batch size of 2048 (using 8 GPUs with 256 images per GPU), an initial learning rate of 5e-4, a weight decay of 0.05, and gradient clipping with a max norm of 5 are used.