Attention Correctness in Neural Image Captioning

Authors: Chenxi Liu, Junhua Mao, Fei Sha, Alan Yuille

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show on the popular Flickr30k and COCO datasets that introducing supervision of attention maps during training solidly improves both attention correctness and caption quality, showing the promise of making machine perception more human-like.
Researcher Affiliation Academia Chenxi Liu,1 Junhua Mao,2 Fei Sha,3 Alan Yuille1,2 Johns Hopkins University1 University of California, Los Angeles2 University of Southern California3
Pseudocode No The paper describes the models and processes using mathematical equations and textual descriptions, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper refers to "publicly available code" from a prior work (Xu et al. 2015) that they build upon, but they do not state that they are releasing their own code for the proposed methodology. The link provided (https://github.com/kelvinxu/arctic-captions) is for the baseline model.
Open Datasets Yes Flickr8k (Hodosh, Young, and Hockenmaier 2013), Flickr30k (Young et al. 2014), and MS COCO (Lin et al. 2014) are the most commonly used benchmark datasets.
Dataset Splits No The paper mentions using the Flickr30k and MS COCO datasets and specifies that experiments are conducted on 1000 test images of Flickr30k, but it does not explicitly provide the specific percentages or counts for training and validation splits.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU types, memory) used for running the experiments. It only mentions using VGG net features pretrained on ImageNet.
Software Dependencies No The paper mentions using the Adam algorithm for training and Dropout for regularization. It also refers to hyperparameters from a publicly available code for a baseline model. However, it does not provide specific version numbers for any software dependencies or libraries used in their own implementation.
Experiment Setup Yes The model is trained using stochastic gradient descent with the Adam algorithm (Kingma and Ba 2014). Dropout (Srivastava et al. 2014) is used as regularization. We set the number of LSTM units to 1300 for Flickr30k and 1800 for COCO.