reproducibilityindex.ai

Hierarchical Attention Network for Image Captioning

Authors: Weixuan Wang, Zhihong Chen, Haifeng Hu8957-8964

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The HAN is veriﬁed on benchmark MSCOCO dataset, and the experimental results indicate that our model outperforms the state-of-the-art methods, achieving a BLEU1 score of 80.9 and a CIDEr score of 121.7 in the Karpathy s test split.
Researcher Affiliation	Academia	Weixuan Wang, Zhihong Chen, Haifeng Hu School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou 510275, China {wangwx25, chenzhh45}@mail2.sysu.edu.cn, huhaif@mail.sysu.edu.cn
Pseudocode	No	The paper does not contain explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement about releasing source code or a link to a code repository.
Open Datasets	Yes	The MSCOCO dataset (Lin et al. 2014) is the benchmark dataset for image captioning, which contains 82,783, 40,504, and 40775 images for training, validation and test respectively. For ofﬂine evalution, we employ the Karpathy s splits (Karpathy and Li 2015) which contain 113,287 images for training, 5,000 images for validation and 5,000 images for test.
Dataset Splits	Yes	For ofﬂine evalution, we employ the Karpathy s splits (Karpathy and Li 2015) which contain 113,287 images for training, 5,000 images for validation and 5,000 images for test.
Hardware Specification	No	The paper mentions "Due to the limitation of the hardware" but does not specify any hardware details such as GPU/CPU models or memory.
Software Dependencies	No	The paper mentions using "Res Net101", "Faster RCNN", "ADAM optimizer", and "LSTM" but does not provide specific version numbers for any of these software components or libraries.
Experiment Setup	Yes	The dimension of these features are reduced to 512. The dimensions of embedding layers and both LSTMs are set to 512. Firstly, we train our model under cross entropy (XE) loss using ADAM optimizer with a learning rate 5e-4 and do not ﬁnetune the CNN. Afterwards, we perform the CIDEr standard optimization on the XE-trained model, and also use the Adam optimizer. In the decoding process, we use beam search and set beam size to 3.