reproducibilityindex.ai

ResT: An Efficient Transformer for Visual Recognition

Authors: Qinglong Zhang, Yu-Bin Yang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We comprehensively validate Res T on image classification and downstream tasks. Experimental results show that the proposed Res T can outperform the recently state-of-the-art backbones by a large margin, demonstrating the potential of Res T as strong backbones.
Researcher Affiliation	Academia	Qing-Long Zhang, Yu-Bin Yang State Key Laboratory for Novel Software Technology Nanjing University, Nanjing 21023, China wofmanaf@smail.nju.edu.cn, yangyubin@nju.edu.cn
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code and models will be made publicly available at https://github.com/wofmanaf/Res T.
Open Datasets	Yes	We comprehensively validate the effectiveness of the proposed Res T on the commonly used benchmarks, including image classification on Image Net-1k and downstream tasks, such as object detection, and instance segmentation on MS COCO2017.
Dataset Splits	Yes	Image Net-1k, which contains 1.28M training images and 50k validation images from 1,000 classes." and "MS COCO2017, which contains 118k training, 5k validation, and 20k test-dev images.
Hardware Specification	Yes	Throughput (images / s) is measured on a single V100 GPU, following [26]." and "A batch size of 2048 (using 8 GPUs with 256 images per GPU), an initial learning rate of 5e-4, a weight decay of 0.05, and gradient clipping with a max norm of 5 are used.
Software Dependencies	No	The paper mentions software components and techniques such as Adam W optimizer, GELU activation, Instance Normalization, Batch Normalization, ReLU activation, and Layer Normalization, but it does not provide specific version numbers for any of these software dependencies or frameworks (e.g., PyTorch, TensorFlow).
Experiment Setup	Yes	For image classification, we employ the Adam W [20] optimizer for 300 epochs using a cosine decay learning rate scheduler and 5 epochs of linear warm-up. A batch size of 2048 (using 8 GPUs with 256 images per GPU), an initial learning rate of 5e-4, a weight decay of 0.05, and gradient clipping with a max norm of 5 are used.