CF-DETR: Coarse-to-Fine Transformers for End-to-End Object Detection
Authors: Xipeng Cao, Peng Yuan, Bailan Feng, Kun Niu185-193
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The effectiveness of CF-DETR is validated via extensive experiments on the coco benchmark. CF-DETR achieves state-of-the-art performance among end-to-end detectors, e.g., achieving 47.8 AP using Res Net-50 with 36 epochs in the standard 3x training schedule. |
| Researcher Affiliation | Collaboration | Xipeng Cao1 , Peng Yuan2 , Bailan Feng2, Kun Niu1 1 Beijing University of Posts and Telecommunications 2 Huawei Noah s Ark Lab {xpcao,niukun}@bupt.edu.cn, {yuanpeng126,fengbailan}@huawei.com |
| Pseudocode | No | The paper includes architectural diagrams but no structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described. |
| Open Datasets | Yes | MS COCO (Lin et al. 2014) instance detection dataset is utilized to evaluate detectors. |
| Dataset Splits | Yes | Where all models are trained on the COCO train2017 set with 118k images and evaluated on the val2017 set with 5k images. |
| Hardware Specification | Yes | CF-DETR is trained on 8 NVIDIA Tesla V100 GPUs, and the batch size is 16 in total. We follow the default 3 training schedule of Detectron2 and the initial learning rate is set to 1 10 4. Data augmentations and trade-off hyperparameters in detection loss are the same with DETR. |
| Software Dependencies | No | The paper mentions 'Detectron2' but does not provide specific version numbers for any software dependencies, libraries, or frameworks. |
| Experiment Setup | Yes | The number of CF decoder layers is set to 6 by default. The settings of coarse layers are the same as the Transformer decoder in DETR. In the fine layer, The shape of Ro I feature maps is 256 7 7. The spatial size k in the ASF module is set to 3. And the dimension scaling factor r and the local attention size k in the LCA is set to 4 and 3 respectively. The default number of object queries is 100. Training Details. The Adam W (Loshchilov and Hutter 2019) optimizer with weight decay 1e-4 is adopted in the training process. CF-DETR is trained on 8 NVIDIA Tesla V100 GPUs, and the batch size is 16 in total. We follow the default 3 training schedule of Detectron2 and the initial learning rate is set to 1 10 4. Data augmentations and trade-off hyperparameters in detection loss are the same with DETR. |