Deformable DETR: Deformable Transformers for End-to-End Object Detection

Authors: Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on the COCO benchmark demonstrate the effectiveness of our approach.
Researcher Affiliation Collaboration Xizhou Zhu1 , Weijie Su2 , Lewei Lu1, Bin Li2, Xiaogang Wang1,3, Jifeng Dai1 1Sense Time Research 2University of Science and Technology of China 3The Chinese University of Hong Kong
Pseudocode No The paper does not contain any section or figure explicitly labeled as "Pseudocode" or "Algorithm".
Open Source Code Yes Code is released at https:// github.com/fundamentalvision/Deformable-DETR.
Open Datasets Yes We conduct experiments on COCO 2017 dataset (Lin et al., 2014).
Dataset Splits Yes Our models are trained on the train set, and evaluated on the val set and test-dev set.
Hardware Specification Yes Run time is evaluated on NVIDIA Tesla V100 GPU.
Software Dependencies No The paper mentions using "Adam optimizer (Kingma & Ba, 2015)" but does not specify versions for other key software components, such as a deep learning framework (e.g., PyTorch, TensorFlow) or specific libraries.
Experiment Setup Yes M = 8 and K = 4 are set for deformable attentions by default. Parameters of the deformable Transformer encoder are shared among different feature levels. Other hyper-parameter setting and training strategy mainly follow DETR (Carion et al., 2020), except that Focal Loss (Lin et al., 2017b) with loss weight of 2 is used for bounding box classification, and the number of object queries is increased from 100 to 300. By default, models are trained for 50 epochs and the learning rate is decayed at the 40-th epoch by a factor of 0.1. Following DETR(Carion et al., 2020), we train our models using Adam optimizer (Kingma & Ba, 2015) with base learning rate of 2 10 4, β1 = 0.9, β2 = 0.999, and weight decay of 10 4. Learning rates of the linear projections, used for predicting object query reference points and sampling offsets, are multiplied by a factor of 0.1.