Recurrent Attentional Reinforcement Learning for Multi-Label Image Recognition

Authors: Tianshui Chen, Zhouxia Wang, Guanbin Li, Liang Lin

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments and comparisons on two large-scale benchmarks (i.e., PASCAL VOC and MSCOCO) show that our model achieves superior performance over existing state-of-the-art methods in both performance and efficiency as well as explicitly identifying image-level semantic labels to specific object regions.
Researcher Affiliation Collaboration Tianshui Chen,1 Zhouxia Wang,1,2 Guanbin Li,1 Liang Lin1,2 1School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China 2Sense Time Group Limited
Pseudocode No The paper describes the algorithms and processes using mathematical formulas and text, but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements about releasing source code or links to a code repository.
Open Datasets Yes Extensive experiments and comparisons on two large-scale benchmarks (i.e., PASCAL VOC and MSCOCO) show that our model achieves superior performance... Pascal VOC 2007 (VOC07) (Everingham et al. 2010) and Microsoft COCO (MS-COCO) (Lin et al. 2014).
Dataset Splits Yes The VOC07 dataset contains 9,963 images of 20 object categories, and it is divided into trainval and test sets... The MS-COCO dataset is originally built for object detection and has also been used for multi-label recognition recently. It is a larger and more challenging dataset, which comprises a training set of 82,081 images and a validation set of 40,137 images from 80 object categories.
Hardware Specification Yes We test our model on a desktop with a single NVIDIA Ge Force GTX TITAN-X GPU.
Software Dependencies No The paper mentions using VGG16 Conv Net and Adam solver, but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes During training, all the images are resized to N N, and randomly cropped with a size of (N 64) (N 64), followed by a randomly horizontal flipping, for data augmentation. In our experiments, we train two models with N = 512 and N = 640, respectively. For the anchor strategy, we set 3 region scales with area 80 80, 160 160, 320 320 for N = 512 and 100 100, 200 200, 400 400 for N = 640, and 3 aspect ratios of 2:1, 1:1, 1:2 for both scales. Thus, k is set as 9. Both of the models are optimized using the Adam solver with a batch size of 16, an initial learning rate of 0.00001, momentums of 0.9 and 0.999.