Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

GiraffeDet: A Heavy-Neck Paradigm for Object Detection

Authors: yiqi jiang, Zhiyu Tan, Junyan Wang, Xiuyu Sun, Ming Lin, Hao Li

ICLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical evaluations on multiple popular object detection benchmarks show that Giraffe Det consistently outperforms previous SOTA models across a wide spectrum of resource constraints.
Researcher Affiliation Industry DAMO Academy, Alibaba Group EMAIL EMAIL
Pseudocode No Information is insufficient. The paper describes the methods in text and uses diagrams, but does not include pseudocode or algorithm blocks.
Open Source Code Yes The source code is available at https://github.com/jyqi/Giraffe Det.
Open Datasets Yes We evaluate Giraffe Det on COCO 2017 detection dataset with 80 object categories. It includes 115k images for training (train), 5k images for validation (val) and 20k images with no public ground-truth for testing (test dev). The training of all methods is conducted on the 115k training images. We report results on the validation dataset for ablation study and results of the testdev dataset from the evaluation server for state-of-the-art comparison and DCN related comparison.
Dataset Splits Yes It includes 115k images for training (train), 5k images for validation (val) and 20k images with no public ground-truth for testing (test dev).
Hardware Specification No Information is insufficient. The paper mentions "popular GPUs" and "multi-GPU training" but does not provide specific hardware details such as GPU models, CPU types, or memory specifications.
Software Dependencies No Information is insufficient. The paper mentions frameworks like mmdetection and methods like GFocal V2 and ATSS, but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, CUDA).
Experiment Setup Yes For fair comparison, all results are produced under mmdetection (Chen et al., 2019b) and the standard COCO-style evaluation protocol. GFocal V2 (Li et al., 2021) and ATSS (Zhang et al., 2020) are applied as head and anchor assigner, respectively. Following the the work of (He et al., 2019), all models are trained from scratch to reduce the in๏ฌ‚uence of pre-train backbones on Image Net. The shorter side of input images is resized to 800 and the maximum size is restricted within 1333. To enhance the stability of scratch training, we adopt multi-scale training for all models, including: 2x imagenet-pretrained (p-2x) learning schedule (24 epochs, decays at 16 and 22 epochs) only in R2-101-DCN backbone experiments, and 3x scratch (s-3x) learning schedule (36 epochs, decays at 28 and 33 epochs) in ablation study, and 6x scratch (s-6x) learning schedule (72 epochs, decays at 65 and 71 epochs) in state-of-the-art comparison. More implementation details in Appendix B. (And Table 8 in Appendix B): Hyperparameter Value Batch Size per GPU 2 Optimizer SGD Learning Rate 0.02 Step Decrease Ratio 0.1 Momentum 0.9 Weight Decay 1.0 x 10 4 Input Image Size [1333, 800] Multi-Scale Range (Ablation Study) [0.8, 1.0] Multi-Scale Range (SOTA) [0.6, 1.2] GFPN Input Channels [128, 256, 512, 1024, 2048] GFPN Output Channels [256, 256, 256, 256, 256] Training Epochs (Ablation Study) 36 epochs from scratch (decays at 28 and 33 epochs) Training Epochs (SOTA) 72 epochs from scratch (decays at 65 and 71 epochs)