R-FCN: Object Detection via Region-based Fully Convolutional Networks

Authors: Jifeng Dai, Yi Li, Kaiming He, Jian Sun

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show competitive results on the PASCAL VOC datasets (e.g., 83.6% m AP on the 2007 set) with the 101-layer Res Net. Meanwhile, our result is achieved at a test-time speed of 170ms per image, 2.5-20 faster than the Faster R-CNN counterpart.
Researcher Affiliation Collaboration Jifeng Dai Microsoft Research Asia Yi Li Tsinghua University Kaiming He Microsoft Research Jian Sun Microsoft Research This work was done when Yi Li was an intern at Microsoft Research.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Code is made publicly available at: https://github.com/daijifeng001/r-fcn.
Open Datasets Yes We train the models on the union set of VOC 2007 trainval and VOC 2012 trainval ( 07+12 ) following [7], and evaluate on VOC 2007 test set. Object detection accuracy is measured by mean Average Precision (m AP). ... Next we evaluate on the MS COCO dataset [14] that has 80 object categories. Our experiments involve the 80k train set, 40k val set, and 20k test-dev set.
Dataset Splits Yes We train the models on the union set of VOC 2007 trainval and VOC 2012 trainval ( 07+12 ) following [7]... Our experiments involve the 80k train set, 40k val set, and 20k test-dev set.
Hardware Specification Yes Timing is evaluated on a single Nvidia K40 GPU.
Software Dependencies No The paper mentions software like ResNet and FCNs, but does not provide specific version numbers for software dependencies needed to replicate the experiment.
Experiment Setup Yes We use a weight decay of 0.0005 and a momentum of 0.9. By default we use single-scale training: images are resized such that the scale (shorter side of image) is 600 pixels [7, 19]. Each GPU holds 1 image and selects B = 128 Ro Is for backprop. We train the model with 8 GPUs (so the effective mini-batch size is 8 ). We fine-tune R-FCN using a learning rate of 0.001 for 20k mini-batches and 0.0001 for 10k mini-batches on VOC.