DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR

Authors: Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, Lei Zhang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also conducted extensive experiments to confirm our analysis and verify the effectiveness of our methods. Code is available at https://github.com/IDEA-opensource/ DAB-DETR. (...) Our results demonstrate that DAB-DETR attains the best performance among DETR-like architectures under the same setting on the COCO object detection benchmark. The proposed method can achieve 45.7% AP when using a single Res Net-50 (He et al., 2016) model as backbone for training 50 epochs.
Researcher Affiliation Collaboration Shilong Liu1,2 , Feng Li2,3, Hao Zhang2,3, Xiao Yang1, Xianbiao Qi2, Hang Su1,4, Jun Zhu1,4 , Lei Zhang2 1Dept. of Comp. Sci. and Tech., BNRist Center, State Key Lab for Intell. Tech. & Sys., Institute for AI, Tsinghua-Bosch Joint Center for ML, Tsinghua University. 2International Digital Economy Academy (IDEA). 3Hong Kong University of Science and Technology. 4Peng Cheng Laboratory, Shenzhen, Guangdong, China.
Pseudocode No The paper does not contain any explicit pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/IDEA-opensource/ DAB-DETR. (...) We have released the source code on Github at https://github.com/IDEA-opensource/DAB-DETR with all materials that are needed to reproduce our results.
Open Datasets Yes Dataset. We conduct the experiments on the COCO (Lin et al., 2014) object detection dataset. All models are trained on the train2017 split and evaluated on the val2017 split.
Dataset Splits Yes All models are trained on the train2017 split and evaluated on the val2017 split.
Hardware Specification Yes All models are trained on Nvidia A100 GPU.
Software Dependencies No The paper mentions using components like ResNet, Transformer, AdamW, PReLU, and focal loss, but it does not specify version numbers for these or any other software libraries or frameworks (e.g., PyTorch, TensorFlow, CUDA).
Experiment Setup Yes All models are trained on 16 GPUs with 1 image per GPU and Adam W (Loshchilov & Hutter, 2018) is used for training with weight decay 10 4. The learning rates for backbone and other modules are set to 10 5 and 10 4, respectively. We train our models for 50 epochs and drop the learning rate by 0.1 after 40 epochs. (...) We also use focal loss (Lin et al., 2020) with α = 0.25, γ = 2 for classification. (...) L1 loss with coefficient 5.0 and GIOU loss (Rezatofighi et al., 2019) with coefficient 2.0 are consistent in both the matching and the final loss calculation procedures.