ResT: An Efficient Transformer for Visual Recognition
Authors: Qinglong Zhang, Yu-Bin Yang
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We comprehensively validate Res T on image classification and downstream tasks. Experimental results show that the proposed Res T can outperform the recently state-of-the-art backbones by a large margin, demonstrating the potential of Res T as strong backbones. |
| Researcher Affiliation | Academia | Qing-Long Zhang, Yu-Bin Yang State Key Laboratory for Novel Software Technology Nanjing University, Nanjing 21023, China wofmanaf@smail.nju.edu.cn, yangyubin@nju.edu.cn |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code and models will be made publicly available at https://github.com/wofmanaf/Res T. |
| Open Datasets | Yes | We comprehensively validate the effectiveness of the proposed Res T on the commonly used benchmarks, including image classification on Image Net-1k and downstream tasks, such as object detection, and instance segmentation on MS COCO2017. |
| Dataset Splits | Yes | Image Net-1k, which contains 1.28M training images and 50k validation images from 1,000 classes." and "MS COCO2017, which contains 118k training, 5k validation, and 20k test-dev images. |
| Hardware Specification | Yes | Throughput (images / s) is measured on a single V100 GPU, following [26]." and "A batch size of 2048 (using 8 GPUs with 256 images per GPU), an initial learning rate of 5e-4, a weight decay of 0.05, and gradient clipping with a max norm of 5 are used. |
| Software Dependencies | No | The paper mentions software components and techniques such as Adam W optimizer, GELU activation, Instance Normalization, Batch Normalization, ReLU activation, and Layer Normalization, but it does not provide specific version numbers for any of these software dependencies or frameworks (e.g., PyTorch, TensorFlow). |
| Experiment Setup | Yes | For image classification, we employ the Adam W [20] optimizer for 300 epochs using a cosine decay learning rate scheduler and 5 epochs of linear warm-up. A batch size of 2048 (using 8 GPUs with 256 images per GPU), an initial learning rate of 5e-4, a weight decay of 0.05, and gradient clipping with a max norm of 5 are used. |