reproducibilityindex.ai

Learning to Guide Decoding for Image Captioning

Authors: Wenhao Jiang, Lin Ma, Xinpeng Chen, Hanwang Zhang, Wei Liu

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The advantages of our proposed approach are veriﬁed by experiments carried out on the MS COCO dataset.
Researcher Affiliation	Collaboration	1Tencent AI Lab, 2Wuhan University, 3Nanyang Technological University
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks clearly labeled as algorithm sections or code-like formatted procedures.
Open Source Code	No	The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described.
Open Datasets	Yes	The MS COCO dataset2 (Lin et al. 2014) is the most popular benchmark dataset for image captioning, which contains about 123,000 images, with each image associated with ﬁve captions.
Dataset Splits	Yes	Following the conventional evaluation procedure (Mun, Cho, and Han 2017; Yao et al. 2016; Yang et al. 2016), we employ the same data split as in (Karpathy and Fei-Fei 2015) for the performance comparisons, where 5,000 images as well as the associated captions are reserved as validation and test set, respectively, with the rest employed for training.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions optimization algorithms and network types (e.g., Ada Grad, LSTM), but does not specify versions for any key software components or libraries (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	LSTM size is set as 2048. The parameters of LSTM are initialized with uniform distribution in [ 0.1, 0.1]. The Ada Grad (Duchi, Hazan, and Singer 2011) is applied to optimize the network, and learning rate is set as 0.01 and weight decay is set as 10 4. Early stopping strategy is used to prevent overﬁtting. If the evaluation measurement on validation set, speciﬁcally the CIDEr, reaches the maximum value, we terminate the training procedure and use the corresponding model for further testing. For sentence generation in testing stage...And the k for all experiments are set as 3.