Image Cationing with Visual-Semantic LSTM

Authors: Nannan Li, Zhenzhong Chen

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on MS COCO and Flickr30K validate the effectiveness of our approach when compared to the stateof-the-art methods.
Researcher Affiliation Academia Nannan Li, Zhenzhong Chen School of Remote Sensing and Information Engineering, Wuhan University, Wuhan, P.R. China {live, zzchen} @whu.edu.cn
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide explicit statements or links for open-source code for the methodology described.
Open Datasets Yes We conduct experiments on Flickr30K [Young et al., 2014] and MS COCO [Lin et al., 2014] datasets which have 31,783 and 123,287 annotated images, respectively.
Dataset Splits Yes We use the public available splits [Karpathy and Fei-Fei, 2015] which has 5000 randomly selected images for validation and test.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions several models and optimizers (VGG16, ResNet-101, Faster R-CNN, Adam optimizer, Batch Normalization) but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes In the LSTM model, the number of hidden nodes of the LSTM is set to 512, with word embedding size of 512. The reward function in reinforcement learning is set to be the CIDEr score. The robust parameter γ of the REINFORCE sampling strategy is set to 0.5 from experimental results. In training, we use the Adam optimizer with learning rate decay and set initial learning rate of 5 10 4. We use 0.5 dropouts of the output and feed back 5% of sampled words every 4 epochs until reaching a 25% feeding back rate [Bengio et al., 2015]. A batch normalization layer [Ioffe and Szegedy, 2015] is added to the beginning of the image encoder to accelerate training with mini-batch size of 50.