ViDT: An Efficient and Effective Fully Transformer-based Object Detector
Authors: Hwanjun Song, Deqing Sun, Sanghyuk Chun, Varun Jampani, Dongyoon Han, Byeongho Heo, Wonjae Kim, Ming-Hsuan Yang
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluation results on the Microsoft COCO benchmark dataset demonstrate that Vi DT obtains the best AP and latency trade-off among existing fully transformer-based object detectors, and achieves 49.2AP owing to its high scalability for large models. |
| Researcher Affiliation | Collaboration | 1NAVER AI Lab 2Google Research 3University of California at Merced 4Yonsei University |
| Pseudocode | No | The paper includes figures illustrating architectures but no formal pseudocode or algorithm blocks with numbered steps formatted as code. |
| Open Source Code | Yes | We release the code and trained models at https://github.com/naver-ai/vidt. |
| Open Datasets | Yes | We carry out object detection experiments on the Microsoft COCO 2017 benchmark dataset (Lin et al., 2014). |
| Dataset Splits | Yes | All the fully transformer-based object detectors are trained on 118K training images and tested on 5K validation images following the literature (Carion et al., 2020). |
| Hardware Specification | Yes | All the algorithms are implemented using Py Torch and executed using eight NVIDIA Tesla V100 GPUs. |
| Software Dependencies | No | The paper mentions PyTorch and AdamW but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We train Vi DT for 50 epochs using Adam W (Loshchilov & Hutter, 2019) with the same initial learning rate of 10 4 for its body, neck and head. The learning rate is decayed by cosine annealing with batch size of 16, weight decay of 1 10 4, and gradient clipping of 0.1. |