ResT V2: Simpler, Faster and Stronger
Authors: Qinglong Zhang, Yu-Bin Yang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We comprehensively validate Res Tv2 on Image Net classification, COCO detection, and ADE20K semantic segmentation. Experimental results show that the proposed Res Tv2 can outperform the recently state-of-the-art backbones by a large margin, demonstrating the potential of Res Tv2 as solid backbones. |
| Researcher Affiliation | Academia | Qing-Long Zhang, Yu-Bin Yang State Key Laboratory for Novel Software Technology Nanjing University, Nanjing 21023, China wofmanaf@smail.nju.edu.cn, yangyubin@nju.edu.cn |
| Pseudocode | No | The paper includes architectural diagrams (e.g., Figure 2) and mathematical equations, but it does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code and models will be made publicly available at https://github.com/ wofmanaf/Res T. |
| Open Datasets | Yes | The Image Net-1k dataset consists of 1.28M training images and 50k validation images from 1,000 classes. [...] Object detection and instance segmentation experiments are conducted on COCO 2017 [...] ADE20K contains a broad range of 150 semantic categories. It has 25K images in total, with 20K for training, 2K for validation, and another 3K for testing. (All three datasets are well-known public benchmarks). |
| Dataset Splits | Yes | The Image Net-1k dataset consists of 1.28M training images and 50k validation images from 1,000 classes. [...] Object detection and instance segmentation experiments are conducted on COCO 2017, which contains 118K training, 5K validation, and 20K test-dev images. [...] ADE20K [...] has 25K images in total, with 20K for training, 2K for validation, and another 3K for testing. |
| Hardware Specification | Yes | Inference throughput (images / s) is measured on a V100 GPU, following [47]. |
| Software Dependencies | No | The paper mentions using Adam W and refers to other tools and frameworks (e.g., MMSegmentation), but it does not specify concrete version numbers for any software dependencies required for replication. |
| Experiment Setup | Yes | We train Res Tv2 for 300 epochs using Adam W [25], with a cosine decay learning rate scheduler and 50 epochs of linear warm-up. An initial learning rate of 1.5e-4 batch_size / 256, a weight decay of 0.05, and gradient clipping with a max norm of 1.0 are used. For data augmentations, we adopt common schemes including Mixup [46], Cutmix [44], Rand Augment [8], and Random Erasing [48]. |