Object Instance Mining for Weakly Supervised Object Detection
Authors: Chenhao Lin, Siwen Wang, Dongqi Xu, Yu Lu, Wayne Zhang11482-11489
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results on two publicly available databases, VOC 2007 and 2012, demonstrate the efficacy of proposed approach. |
| Researcher Affiliation | Collaboration | Chenhao Lin,1 Siwen Wang,2 Dongqi Xu,1 Yu Lu,1 Wayne Zhang1 1Sense Time Research 2Dalian University of Technology, Dalian, China, 116024 |
| Pseudocode | Yes | Algorithm 1: Object Instance Mining |
| Open Source Code | Yes | https://github.com/bigvideoresearch/OIM |
| Open Datasets | Yes | Following the previous state-of-the-art methods on WSOD, we also evaluate our approach two datasets, PASCAL VOC2007(Everingham et al. 2010) and VOC2012(Everingham et al. 2015), which both contain 20 object categories. |
| Dataset Splits | Yes | For VOC2007, we train the model on the trainval set (5,011 images) and evaluate the performance on the test set (4,952 images). For VOC2012, the trainval set (11,540 images) and the test set (10,991 images) are used for training and evaluation respectively. Additionally, we train our model on the VOC2012 train set (5,717 images) and proceed evaluation on the val set (5,823 images) to further validate the effectiveness of proposed approach. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'VGG16 model pre-trained on the Image Net dataset' but does not specify version numbers for any software dependencies like deep learning frameworks (e.g., PyTorch, TensorFlow, Caffe) or programming languages. |
| Experiment Setup | Yes | The batch size is set to 2, and the learning rates are set to 0.001 and 0.0001 for the first 40K and the following 50K iterations respectively. During training and test, we take five image scales {480, 576, 688, 864, 1200} along with random horizontal flipping for data augmentation. Following (Tang et al. 2017), the threshold T is set to 0.5. With the increased number of iterations, the network has more stable learning ability, we dynamically set the hyper parameters α as α1 = 5 for the first 70K and α2 = 2 for the following 20K iterations. β are empirically set to 0.2 in our experiments. |