Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution

Authors: Thang Vu, Hyunjun Jang, Trung X. Pham, Chang Yoo

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experiments are performed on the COCO 2017 detection dataset [26]. All the models are trained on the train split (115k images). The region proposal performance and ablation analysis are reported on val split (5k images), and the benchmarking detection performance is reported on test-dev split (20k images).
Researcher Affiliation Academia Thang Vu, Hyunjun Jang, Trung X. Pham, Chang D. Yoo Department of Electrical Engineering Korea Advanced Institute of Science and Technology {thangvubk,wiseholi,trungpx,cd_yoo}@kaist.ac.kr
Pseudocode Yes Algorithm 1. Cascade RPN
Open Source Code Yes The code is made publicly available at https://github.com/thangvubk/Cascade-RPN.
Open Datasets Yes The experiments are performed on the COCO 2017 detection dataset [26].
Dataset Splits Yes All the models are trained on the train split (115k images). The region proposal performance and ablation analysis are reported on val split (5k images), and the benchmarking detection performance is reported on test-dev split (20k images).
Hardware Specification Yes It takes about 12 hours for the models to converge on 8 Tesla V100 GPUs.
Software Dependencies No The models are implemented with Py Torch [29] and mmdetection [8]. Specific version numbers for these software components are not provided.
Experiment Setup Yes The model consists of two stages, with Res Net50-FPN [24] being its backbone. A single anchor per location is used with size of 322, 642, 1282, 2562, and 5122 corresponding to the feature levels C2, C3, C4, C5, and C6, respectively [24]. The first stage uses the anchor-free metric for sample discrimination with the thresholds of the center-region σctr and ignore-region σign, which are adopted from [40, 37], being 0.2 and 0.5. The second stage uses the anchor-based metric with the Io U threshold of 0.7. The multi-task loss is set with the stage-wise weight α1 = α2 = 1 and the balance term λ = 10. The NMS threshold is set to 0.8. In all experiments, the long edge and the short edge of the images are resized to 1333 and 800 respectively without changing the aspect ratio. No data augmentation is used except for standard horizontal image flipping... The models are trained with 8 GPUs with a batch size of 16 (two images per GPU) for 12 epochs using SGD optimizer. The learning rate is initialized to 0.02 and divided by 10 after 8 and 11 epochs.