reproducibilityindex.ai

Exploring and Distilling Cross-Modal Information for Image Captioning

Authors: Fenglin Liu, Xuancheng Ren, Yuanxin Liu, Kai Lei, Xu Sun

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experiments on COCO image captioning dataset validate our argument and prove the effectiveness of the proposed approach.
Researcher Affiliation	Academia	1Shenzhen Key Lab for Information Centric Networking & Blockchain Technology (ICNLAB), School of Electronics and Computer Engineering (SECE), Peking University 2MOE Key Laboratory of Computational Linguistics, School of EECS, Peking University 3School of ICE, Beijing University of Posts and Telecommunications
Pseudocode	No	The paper describes the model architecture and equations but does not contain a structured pseudocode or algorithm block.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	We evaluate the proposed approach on the widely-used COCO dataset [Chen et al., 2015], which contains 123,287 images.
Dataset Splits	Yes	We use the publicly-available splits in [Karpathy and Li, 2015] for ofﬂine evaluation. There are 5,000 images each in validation set and test set for COCO.
Hardware Specification	Yes	Time and Speed is measured on a single NVIDIA Ge Force GTX 1080 Ti.
Software Dependencies	No	The paper mentions general software components (e.g., Python, PyTorch, TensorFlow) but does not provide specific version numbers for these or other libraries/solvers.
Experiment Setup	Yes	The word embedding size and model size are 256 and 512, respectively, and in implementation, we share the attribute embedding and the input word embedding. The number of heads n in multi-head attention is set to 8 unless otherwise stated. We train the model with both cross-entropy loss and reinforcement learning optimizing CIDEr. The model is trained with batch size of 80 for 25 epochs with early stopping based on CIDEr with cross-entropy loss, followed by reinforcement learning. We use Adam [Kingma and Ba, 2014] with a learning rate of 10 4 for parameter optimization. We also apply beam search with beam size = 3 during inference.