R-FCN: Object Detection via Region-based Fully Convolutional Networks
Authors: Jifeng Dai, Yi Li, Kaiming He, Jian Sun
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show competitive results on the PASCAL VOC datasets (e.g., 83.6% m AP on the 2007 set) with the 101-layer Res Net. Meanwhile, our result is achieved at a test-time speed of 170ms per image, 2.5-20 faster than the Faster R-CNN counterpart. |
| Researcher Affiliation | Collaboration | Jifeng Dai Microsoft Research Asia Yi Li Tsinghua University Kaiming He Microsoft Research Jian Sun Microsoft Research This work was done when Yi Li was an intern at Microsoft Research. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is made publicly available at: https://github.com/daijifeng001/r-fcn. |
| Open Datasets | Yes | We train the models on the union set of VOC 2007 trainval and VOC 2012 trainval ( 07+12 ) following [7], and evaluate on VOC 2007 test set. Object detection accuracy is measured by mean Average Precision (m AP). ... Next we evaluate on the MS COCO dataset [14] that has 80 object categories. Our experiments involve the 80k train set, 40k val set, and 20k test-dev set. |
| Dataset Splits | Yes | We train the models on the union set of VOC 2007 trainval and VOC 2012 trainval ( 07+12 ) following [7]... Our experiments involve the 80k train set, 40k val set, and 20k test-dev set. |
| Hardware Specification | Yes | Timing is evaluated on a single Nvidia K40 GPU. |
| Software Dependencies | No | The paper mentions software like ResNet and FCNs, but does not provide specific version numbers for software dependencies needed to replicate the experiment. |
| Experiment Setup | Yes | We use a weight decay of 0.0005 and a momentum of 0.9. By default we use single-scale training: images are resized such that the scale (shorter side of image) is 600 pixels [7, 19]. Each GPU holds 1 image and selects B = 128 Ro Is for backprop. We train the model with 8 GPUs (so the effective mini-batch size is 8 ). We fine-tune R-FCN using a learning rate of 0.001 for 20k mini-batches and 0.0001 for 10k mini-batches on VOC. |