reproducibilityindex.ai

Vision Transformer Adapter for Dense Predictions

Authors: Zhe Chen, Yuchen Duan, Wenhai Wang, Junjun He, Tong Lu, Jifeng Dai, Yu Qiao

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We verify Vi T-Adapter on multiple dense prediction tasks, including object detection, instance segmentation, and semantic segmentation. Notably, without using extra detection data, our Vi T-Adapter-L yields state-of-the-art 60.9 box AP and 53.0 mask AP on COCO testdev. We hope that the Vi T-Adapter could serve as an alternative for vision-specific transformers and facilitate future research. We evaluate the Vi T-Adapter on multiple challenging benchmarks, including COCO (Lin et al., 2014) and ADE20K (Zhou et al., 2017). As shown in Figure 2, our models consistently achieve improved performance compared to the prior arts under the fair pre-training strategy.
Researcher Affiliation	Collaboration	Zhe Chen1,2 , Yuchen Duan2,3 , Wenhai Wang2 , Junjun He2, Tong Lu1 , Jifeng Dai2,3, Yu Qiao2 1Nanjing University, 2Shanghai AI Laboratory, 3Tsinghua University czcz94cz@gmail.com, {duanyuchen,wangwenhai,hejunjun}@pjlab.org.cn lutong@nju.edu.cn, {daijifeng,qiaoyu}@pjlab.org.cn
Pseudocode	No	The paper does not contain any pseudocode or explicitly labeled algorithm blocks.
Open Source Code	No	Code and models will be released at https://github.com/czczup/Vi T-Adapter.
Open Datasets	Yes	Our detection experiments are based on MMDetection (Chen et al., 2019b) and the COCO (Lin et al., 2014) dataset. We evaluate our Vi T-Adapter on semantic segmentation with the ADE20K (Zhou et al., 2017) dataset and MMSegmentation (Contributors, 2020) codebase.
Dataset Splits	Yes	Figure 2: Object detection performance on COCO val2017 using Mask R-CNN. Table 1: Object detection and instance segmentation with Mask R-CNN on COCO val2017.
Hardware Specification	Yes	The per-iteration training time and GPU training memory are measured by A100 GPUs with per-GPU batch size 2 and FP16 training.
Software Dependencies	No	The paper mentions using MMDetection and MMSegmentation codebases but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	Following common practices (Wang et al., 2021), we adopt 1 or 3 training schedule (i.e., 12 or 36 epochs) with a batch size of 16, and Adam W (Loshchilov & Hutter, 2017) optimizer with an initial learning rate of 1 10 4 and a weight decay of 0.05. We use a layer-wise learning rate decay of 0.9, and a drop path rate of 0.4.