Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution
Authors: Thang Vu, Hyunjun Jang, Trung X. Pham, Chang Yoo
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiments are performed on the COCO 2017 detection dataset [26]. All the models are trained on the train split (115k images). The region proposal performance and ablation analysis are reported on val split (5k images), and the benchmarking detection performance is reported on test-dev split (20k images). |
| Researcher Affiliation | Academia | Thang Vu, Hyunjun Jang, Trung X. Pham, Chang D. Yoo Department of Electrical Engineering Korea Advanced Institute of Science and Technology {thangvubk,wiseholi,trungpx,cd_yoo}@kaist.ac.kr |
| Pseudocode | Yes | Algorithm 1. Cascade RPN |
| Open Source Code | Yes | The code is made publicly available at https://github.com/thangvubk/Cascade-RPN. |
| Open Datasets | Yes | The experiments are performed on the COCO 2017 detection dataset [26]. |
| Dataset Splits | Yes | All the models are trained on the train split (115k images). The region proposal performance and ablation analysis are reported on val split (5k images), and the benchmarking detection performance is reported on test-dev split (20k images). |
| Hardware Specification | Yes | It takes about 12 hours for the models to converge on 8 Tesla V100 GPUs. |
| Software Dependencies | No | The models are implemented with Py Torch [29] and mmdetection [8]. Specific version numbers for these software components are not provided. |
| Experiment Setup | Yes | The model consists of two stages, with Res Net50-FPN [24] being its backbone. A single anchor per location is used with size of 322, 642, 1282, 2562, and 5122 corresponding to the feature levels C2, C3, C4, C5, and C6, respectively [24]. The first stage uses the anchor-free metric for sample discrimination with the thresholds of the center-region σctr and ignore-region σign, which are adopted from [40, 37], being 0.2 and 0.5. The second stage uses the anchor-based metric with the Io U threshold of 0.7. The multi-task loss is set with the stage-wise weight α1 = α2 = 1 and the balance term λ = 10. The NMS threshold is set to 0.8. In all experiments, the long edge and the short edge of the images are resized to 1333 and 800 respectively without changing the aspect ratio. No data augmentation is used except for standard horizontal image flipping... The models are trained with 8 GPUs with a batch size of 16 (two images per GPU) for 12 epochs using SGD optimizer. The learning rate is initialized to 0.02 and divided by 10 after 8 and 11 epochs. |