Multi-Level Policy and Reward Reinforcement Learning for Image Captioning
Authors: Anan Liu, Ning Xu, Hanwang Zhang, Weizhi Nie, Yuting Su, Yongdong Zhang
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments and analysis on MSCOCO and Flickr30k show that the proposed framework can achieve competing performances with respect to different evaluation metrics. We perform comprehensive evaluations on MSCOCO and Flickr30k datasets. Our framework achieves the competing performances against state-of-the-art methods. Ablative studies showcase the effect of the proposed framework. |
| Researcher Affiliation | Academia | 1 School of Electrical and Information Engineering, Tianjin University, Tianjin, China 2 School of Computer Science and Engineering, Nanyang Technological University, Singapore 3 University of Science and Technology of China, Hefei, China liuanan@tju.edu.cn |
| Pseudocode | No | The paper describes its approach and training process using textual descriptions and mathematical formulations (e.g., equations 1-9), but it does not include any formally labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not include an explicit statement about releasing the source code for the described methodology or a direct link to a code repository. It only refers to a third-party evaluation tool: 'Microsoft COCO caption evaluation tool 1, https://github.com/tylin/coco-caption'. |
| Open Datasets | Yes | We evaluate our framework on captioning datasets: MSCOCO and Flickr30k. For fair comparison, we adopt the splits consistent with [Karpathy and Fei-Fei, 2017]. |
| Dataset Splits | Yes | For fair comparison, we adopt the splits consistent with [Karpathy and Fei-Fei, 2017], which uses 5,000 images for validation and test on MSCOCO; 1,000 images for validation and test on Flickr30k. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU models, or cloud computing instance types used for running the experiments. It only describes software settings. |
| Software Dependencies | No | The paper states 'All experiments are implemented by Py Torch,' but it does not specify the version number of PyTorch or any other software dependencies, such as the COCO evaluation tool mentioned. |
| Experiment Setup | Yes | As shown in Figure 1, we take the output of the 2048-d pool5 layer from Res Net-101 as image feature I. We use one LSTM unit with 2048-d hidden layers to construct RNN, and the dimension of both linear mapping layers is set to 2048 512. In training, the LSTM hidden, image, word and attention embedding dimension are fixed to 512 for the word-level policy. We use Adam optimizer with an initial learning rate of 5 10 5 and minibatches of size 64. The maximum number of epochs is 30. The margin λ in Eq. 5, β in Eq. 9, and γ in Eq. 4 are set as 0.6, 0.6, and 0.2, respectively. In testing, beam search is set to 1. |