reproducibilityindex.ai

Text-Guided Attention Model for Image Captioning

Authors: Jonghwan Mun, Minsu Cho, Bohyung Han

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our model on MSCOCO Captioning benchmark and achieve the state-of-the-art performance in standard metrics. Experiments This section describes our experimental setting and presents quantitative and qualitative results of our algorithm in comparison to recent methods.
Researcher Affiliation	Academia	Jonghwan Mun, Minsu Cho, Bohyung Han Department of Computer Science and Engineering, POSTECH, Korea {choco1916, mscho, bhhan}@postech.ac.kr
Pseudocode	No	The paper describes its algorithm in prose, but no structured pseudocode or algorithm blocks are provided.
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets	Yes	We train our model on MS-COCO dataset (Lin et al. 2014), which contains 123,287 images.
Dataset Splits	Yes	The images are divided into 82,783 training images and 40,504 validation images. Each split of validation and testing data contains randomly selected 5,000 images from the original validation images.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments are provided in the paper.
Software Dependencies	No	The paper mentions optimizers and neural network architectures but does not specify version numbers for any key software libraries or dependencies.
Experiment Setup	Yes	In the decoder, the dimensionalities of the word embedding space and the hidden state of LSTM are set to 512. We use Adam (Kingma and Ba 2015) to learn the model with mini-batch size of 80, where dropouts with 0.5 are applied to the output layer of decoder. The learning rate starts from 0.0004 and after 10 epochs decays by the factor of 0.8 at every three epoch. ... scheduled sampling... integrated in our learning procedure after 10 epochs with ground-truth word selection probability ﬁxed to 0.75. ... we ﬁx n = 60 and k = 10 in both training and testing.