Recurrent Attentional Reinforcement Learning for Multi-Label Image Recognition
Authors: Tianshui Chen, Zhouxia Wang, Guanbin Li, Liang Lin
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments and comparisons on two large-scale benchmarks (i.e., PASCAL VOC and MSCOCO) show that our model achieves superior performance over existing state-of-the-art methods in both performance and efficiency as well as explicitly identifying image-level semantic labels to specific object regions. |
| Researcher Affiliation | Collaboration | Tianshui Chen,1 Zhouxia Wang,1,2 Guanbin Li,1 Liang Lin1,2 1School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China 2Sense Time Group Limited |
| Pseudocode | No | The paper describes the algorithms and processes using mathematical formulas and text, but it does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | Yes | Extensive experiments and comparisons on two large-scale benchmarks (i.e., PASCAL VOC and MSCOCO) show that our model achieves superior performance... Pascal VOC 2007 (VOC07) (Everingham et al. 2010) and Microsoft COCO (MS-COCO) (Lin et al. 2014). |
| Dataset Splits | Yes | The VOC07 dataset contains 9,963 images of 20 object categories, and it is divided into trainval and test sets... The MS-COCO dataset is originally built for object detection and has also been used for multi-label recognition recently. It is a larger and more challenging dataset, which comprises a training set of 82,081 images and a validation set of 40,137 images from 80 object categories. |
| Hardware Specification | Yes | We test our model on a desktop with a single NVIDIA Ge Force GTX TITAN-X GPU. |
| Software Dependencies | No | The paper mentions using VGG16 Conv Net and Adam solver, but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | During training, all the images are resized to N N, and randomly cropped with a size of (N 64) (N 64), followed by a randomly horizontal flipping, for data augmentation. In our experiments, we train two models with N = 512 and N = 640, respectively. For the anchor strategy, we set 3 region scales with area 80 80, 160 160, 320 320 for N = 512 and 100 100, 200 200, 400 400 for N = 640, and 3 aspect ratios of 2:1, 1:1, 1:2 for both scales. Thus, k is set as 9. Both of the models are optimized using the Adam solver with a batch size of 16, an initial learning rate of 0.00001, momentums of 0.9 and 0.999. |