Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Authors: Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun
NeurIPS 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on the PASCAL VOC detection benchmarks [4], where RPNs with Fast R-CNNs produce detection accuracy better than the strong baseline of Selective Search with Fast R-CNNs. Meanwhile, our method waives nearly all computational burdens of SS at test-time the effective running time for proposals is just 10 milliseconds. Using the expensive very deep models of [19], our detection method still has a frame rate of 5fps (including all steps) on a GPU, and thus is a practical object detection system in terms of both speed and accuracy (73.2% m AP on PASCAL VOC 2007 and 70.4% m AP on 2012). |
| Researcher Affiliation | Collaboration | Shaoqing Ren Kaiming He Ross Girshick Jian Sun Microsoft Research EMAIL. Shaoqing Ren is with the University of Science and Technology of China. This work was done when he was an intern at Microsoft Research. |
| Pseudocode | No | The paper describes the algorithms and training scheme in paragraph text and figures, but does not include formal pseudocode blocks or algorithms. |
| Open Source Code | Yes | Code is available at https://github.com/Shaoqing Ren/faster_rcnn. |
| Open Datasets | Yes | We comprehensively evaluate our method on the PASCAL VOC 2007 detection benchmark [4]. This dataset consists of about 5k trainval images and 5k test images over 20 object categories. We also provide results in the PASCAL VOC 2012 benchmark for a few models. [4] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results, 2007. |
| Dataset Splits | Yes | This dataset consists of about 5k trainval images and 5k test images over 20 object categories. We also provide results in the PASCAL VOC 2012 benchmark for a few models. For the Image Net pre-trained network, we use the fast version of ZF net [23] that has 5 conv layers and 3 fc layers, and the public VGG-16 model [19] that has 13 conv layers and 3 fc layers. |
| Hardware Specification | Yes | Table 4: Timing (ms) on a K40 GPU, except SS proposal is evaluated in a CPU. |
| Software Dependencies | No | Our implementation uses Caffe [10]. The paper does not specify the version of Caffe or any other software dependencies with version numbers. |
| Experiment Setup | Yes | We use a learning rate of 0.001 for 60k mini-batches, and 0.0001 for the next 20k mini-batches on the PASCAL dataset. We also use a momentum of 0.9 and a weight decay of 0.0005 [11]. Our implementation uses Caffe [10]. We randomly initialize all new layers by drawing weights from a zero-mean Gaussian distribution with standard deviation 0.01. For anchors, we use 3 scales with box areas of 128^2, 256^2, and 512^2 pixels, and 3 aspect ratios of 1:1, 1:2, and 2:1. |