CF-DETR: Coarse-to-Fine Transformers for End-to-End Object Detection

Authors: Xipeng Cao, Peng Yuan, Bailan Feng, Kun Niu185-193

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The effectiveness of CF-DETR is validated via extensive experiments on the coco benchmark. CF-DETR achieves state-of-the-art performance among end-to-end detectors, e.g., achieving 47.8 AP using Res Net-50 with 36 epochs in the standard 3x training schedule.
Researcher Affiliation Collaboration Xipeng Cao1 , Peng Yuan2 , Bailan Feng2, Kun Niu1 1 Beijing University of Posts and Telecommunications 2 Huawei Noah s Ark Lab {xpcao,niukun}@bupt.edu.cn, {yuanpeng126,fengbailan}@huawei.com
Pseudocode No The paper includes architectural diagrams but no structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access to source code for the methodology described.
Open Datasets Yes MS COCO (Lin et al. 2014) instance detection dataset is utilized to evaluate detectors.
Dataset Splits Yes Where all models are trained on the COCO train2017 set with 118k images and evaluated on the val2017 set with 5k images.
Hardware Specification Yes CF-DETR is trained on 8 NVIDIA Tesla V100 GPUs, and the batch size is 16 in total. We follow the default 3 training schedule of Detectron2 and the initial learning rate is set to 1 10 4. Data augmentations and trade-off hyperparameters in detection loss are the same with DETR.
Software Dependencies No The paper mentions 'Detectron2' but does not provide specific version numbers for any software dependencies, libraries, or frameworks.
Experiment Setup Yes The number of CF decoder layers is set to 6 by default. The settings of coarse layers are the same as the Transformer decoder in DETR. In the fine layer, The shape of Ro I feature maps is 256 7 7. The spatial size k in the ASF module is set to 3. And the dimension scaling factor r and the local attention size k in the LCA is set to 4 and 3 respectively. The default number of object queries is 100. Training Details. The Adam W (Loshchilov and Hutter 2019) optimizer with weight decay 1e-4 is adopted in the training process. CF-DETR is trained on 8 NVIDIA Tesla V100 GPUs, and the batch size is 16 in total. We follow the default 3 training schedule of Detectron2 and the initial learning rate is set to 1 10 4. Data augmentations and trade-off hyperparameters in detection loss are the same with DETR.