reproducibilityindex.ai

Temporal-Difference Learning With Sampling Baseline for Image Captioning

Authors: Hui Chen, Guiguang Ding, Sicheng Zhao, Jungong Han

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that our proposed method can improve the quality of generated captions and outperforms the state-of-the-art methods on the benchmark dataset MS COCO in terms of seven evaluation metrics. We conduct a massive of experiments and comparisons with other methods. The results demonstrate that the proposed method has a signiﬁcant superiority over the-stateof-the-art methods
Researcher Affiliation	Academia	School of Software, Tsinghua University, Beijing 100084, China School of Computing and Communications, Lancaster University, Lancaster, LA1 4YW, UK {jichenhui2012,schzhao,jungonghan77}@gmail.com, dinggg@tsinghua.edu.cn
Pseudocode	No	The paper describes the method using mathematical equations and textual explanations, but it does not include a clearly labeled pseudocode or algorithm block.
Open Source Code	No	The paper states 'We use the code publicly1 to preprocess the dataset' with a footnote to 'https://github.com/karpathy/neuraltalk'. This refers to third-party code used, not the authors' own source code for the methodology described in the paper.
Open Datasets	Yes	We evaluate our proposed method on the popular MS COCO dataset (Lin et al. 2014).
Dataset Splits	Yes	MS COCO dataset contains 123,287 images labeled with at least 5 captions including 82783 training images and 40504 validation images. MS COCO provides 40775 images as test set for online evaluation as well. Since the standard test set is not public, we use 5000 images for validation, 5000 images for test and the remains for training, as in previous works (Xu et al. 2015; You et al. 2016; Chen et al. 2017c) for ofﬂine evaluation.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or processing units) used for running its experiments.
Software Dependencies	No	The paper mentions software components like ResNet-101, ADAM optimizer, and LSTM, but does not provide specific version numbers for these or any other software dependencies, such as programming languages or libraries.
Experiment Setup	Yes	We train models under the XENT loss using ADAM optimizer with a learning rate of 5 10 4 and ﬁnetune the CNN from the beginning. For all models, the batch size is set to 16 and every 1K iterations the model evaluation will be performed during training. When training models under the RL loss, the learning rate for language model is initialized to 1 10 4 and set to 5 10 5 after 50K iterations, then decreased 1 10 5 every 100K iterations until 1 10 5. By default, the beam search size is ﬁxed to 3 for all models for test.