GiraffeDet: A Heavy-Neck Paradigm for Object Detection

Authors: yiqi jiang, Zhiyu Tan, Junyan Wang, Xiuyu Sun, Ming Lin, Hao Li

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical evaluations on multiple popular object detection benchmarks show that Giraffe Det consistently outperforms previous SOTA models across a wide spectrum of resource constraints.
Researcher Affiliation Industry DAMO Academy, Alibaba Group {yiqi.jyq, zhiyu.tzy, wangjunyan.wjy}@alibaba-inc.com {xiuyu.sxy, ming.l, lihao.lh}@alibaba-inc.com
Pseudocode No Information is insufficient. The paper describes the methods in text and uses diagrams, but does not include pseudocode or algorithm blocks.
Open Source Code Yes The source code is available at https://github.com/jyqi/Giraffe Det.
Open Datasets Yes We evaluate Giraffe Det on COCO 2017 detection dataset with 80 object categories. It includes 115k images for training (train), 5k images for validation (val) and 20k images with no public ground-truth for testing (test dev). The training of all methods is conducted on the 115k training images. We report results on the validation dataset for ablation study and results of the testdev dataset from the evaluation server for state-of-the-art comparison and DCN related comparison.
Dataset Splits Yes It includes 115k images for training (train), 5k images for validation (val) and 20k images with no public ground-truth for testing (test dev).
Hardware Specification No Information is insufficient. The paper mentions "popular GPUs" and "multi-GPU training" but does not provide specific hardware details such as GPU models, CPU types, or memory specifications.
Software Dependencies No Information is insufficient. The paper mentions frameworks like mmdetection and methods like GFocal V2 and ATSS, but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, CUDA).
Experiment Setup Yes For fair comparison, all results are produced under mmdetection (Chen et al., 2019b) and the standard COCO-style evaluation protocol. GFocal V2 (Li et al., 2021) and ATSS (Zhang et al., 2020) are applied as head and anchor assigner, respectively. Following the the work of (He et al., 2019), all models are trained from scratch to reduce the influence of pre-train backbones on Image Net. The shorter side of input images is resized to 800 and the maximum size is restricted within 1333. To enhance the stability of scratch training, we adopt multi-scale training for all models, including: 2x imagenet-pretrained (p-2x) learning schedule (24 epochs, decays at 16 and 22 epochs) only in R2-101-DCN backbone experiments, and 3x scratch (s-3x) learning schedule (36 epochs, decays at 28 and 33 epochs) in ablation study, and 6x scratch (s-6x) learning schedule (72 epochs, decays at 65 and 71 epochs) in state-of-the-art comparison. More implementation details in Appendix B. (And Table 8 in Appendix B): Hyperparameter Value Batch Size per GPU 2 Optimizer SGD Learning Rate 0.02 Step Decrease Ratio 0.1 Momentum 0.9 Weight Decay 1.0 x 10 4 Input Image Size [1333, 800] Multi-Scale Range (Ablation Study) [0.8, 1.0] Multi-Scale Range (SOTA) [0.6, 1.2] GFPN Input Channels [128, 256, 512, 1024, 2048] GFPN Output Channels [256, 256, 256, 256, 256] Training Epochs (Ablation Study) 36 epochs from scratch (decays at 28 and 33 epochs) Training Epochs (SOTA) 72 epochs from scratch (decays at 65 and 71 epochs)