Object Instance Mining for Weakly Supervised Object Detection

Authors: Chenhao Lin, Siwen Wang, Dongqi Xu, Yu Lu, Wayne Zhang11482-11489

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results on two publicly available databases, VOC 2007 and 2012, demonstrate the efficacy of proposed approach.
Researcher Affiliation Collaboration Chenhao Lin,1 Siwen Wang,2 Dongqi Xu,1 Yu Lu,1 Wayne Zhang1 1Sense Time Research 2Dalian University of Technology, Dalian, China, 116024
Pseudocode Yes Algorithm 1: Object Instance Mining
Open Source Code Yes https://github.com/bigvideoresearch/OIM
Open Datasets Yes Following the previous state-of-the-art methods on WSOD, we also evaluate our approach two datasets, PASCAL VOC2007(Everingham et al. 2010) and VOC2012(Everingham et al. 2015), which both contain 20 object categories.
Dataset Splits Yes For VOC2007, we train the model on the trainval set (5,011 images) and evaluate the performance on the test set (4,952 images). For VOC2012, the trainval set (11,540 images) and the test set (10,991 images) are used for training and evaluation respectively. Additionally, we train our model on the VOC2012 train set (5,717 images) and proceed evaluation on the val set (5,823 images) to further validate the effectiveness of proposed approach.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions 'VGG16 model pre-trained on the Image Net dataset' but does not specify version numbers for any software dependencies like deep learning frameworks (e.g., PyTorch, TensorFlow, Caffe) or programming languages.
Experiment Setup Yes The batch size is set to 2, and the learning rates are set to 0.001 and 0.0001 for the first 40K and the following 50K iterations respectively. During training and test, we take five image scales {480, 576, 688, 864, 1200} along with random horizontal flipping for data augmentation. Following (Tang et al. 2017), the threshold T is set to 0.5. With the increased number of iterations, the network has more stable learning ability, we dynamically set the hyper parameters α as α1 = 5 for the first 70K and α2 = 2 for the following 20K iterations. β are empirically set to 0.2 in our experiments.