ResT V2: Simpler, Faster and Stronger

Authors: Qinglong Zhang, Yu-Bin Yang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We comprehensively validate Res Tv2 on Image Net classification, COCO detection, and ADE20K semantic segmentation. Experimental results show that the proposed Res Tv2 can outperform the recently state-of-the-art backbones by a large margin, demonstrating the potential of Res Tv2 as solid backbones.
Researcher Affiliation Academia Qing-Long Zhang, Yu-Bin Yang State Key Laboratory for Novel Software Technology Nanjing University, Nanjing 21023, China wofmanaf@smail.nju.edu.cn, yangyubin@nju.edu.cn
Pseudocode No The paper includes architectural diagrams (e.g., Figure 2) and mathematical equations, but it does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes The code and models will be made publicly available at https://github.com/ wofmanaf/Res T.
Open Datasets Yes The Image Net-1k dataset consists of 1.28M training images and 50k validation images from 1,000 classes. [...] Object detection and instance segmentation experiments are conducted on COCO 2017 [...] ADE20K contains a broad range of 150 semantic categories. It has 25K images in total, with 20K for training, 2K for validation, and another 3K for testing. (All three datasets are well-known public benchmarks).
Dataset Splits Yes The Image Net-1k dataset consists of 1.28M training images and 50k validation images from 1,000 classes. [...] Object detection and instance segmentation experiments are conducted on COCO 2017, which contains 118K training, 5K validation, and 20K test-dev images. [...] ADE20K [...] has 25K images in total, with 20K for training, 2K for validation, and another 3K for testing.
Hardware Specification Yes Inference throughput (images / s) is measured on a V100 GPU, following [47].
Software Dependencies No The paper mentions using Adam W and refers to other tools and frameworks (e.g., MMSegmentation), but it does not specify concrete version numbers for any software dependencies required for replication.
Experiment Setup Yes We train Res Tv2 for 300 epochs using Adam W [25], with a cosine decay learning rate scheduler and 50 epochs of linear warm-up. An initial learning rate of 1.5e-4 batch_size / 256, a weight decay of 0.05, and gradient clipping with a max norm of 1.0 are used. For data augmentations, we adopt common schemes including Mixup [46], Cutmix [44], Rand Augment [8], and Random Erasing [48].