Learning to Guide Decoding for Image Captioning

Authors: Wenhao Jiang, Lin Ma, Xinpeng Chen, Hanwang Zhang, Wei Liu

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The advantages of our proposed approach are verified by experiments carried out on the MS COCO dataset.
Researcher Affiliation Collaboration 1Tencent AI Lab, 2Wuhan University, 3Nanyang Technological University
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks clearly labeled as algorithm sections or code-like formatted procedures.
Open Source Code No The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described.
Open Datasets Yes The MS COCO dataset2 (Lin et al. 2014) is the most popular benchmark dataset for image captioning, which contains about 123,000 images, with each image associated with five captions.
Dataset Splits Yes Following the conventional evaluation procedure (Mun, Cho, and Han 2017; Yao et al. 2016; Yang et al. 2016), we employ the same data split as in (Karpathy and Fei-Fei 2015) for the performance comparisons, where 5,000 images as well as the associated captions are reserved as validation and test set, respectively, with the rest employed for training.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions optimization algorithms and network types (e.g., Ada Grad, LSTM), but does not specify versions for any key software components or libraries (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes LSTM size is set as 2048. The parameters of LSTM are initialized with uniform distribution in [ 0.1, 0.1]. The Ada Grad (Duchi, Hazan, and Singer 2011) is applied to optimize the network, and learning rate is set as 0.01 and weight decay is set as 10 4. Early stopping strategy is used to prevent overfitting. If the evaluation measurement on validation set, specifically the CIDEr, reaches the maximum value, we terminate the training procedure and use the corresponding model for further testing. For sentence generation in testing stage...And the k for all experiments are set as 3.