Show and Tell More: Topic-Oriented Multi-Sentence Image Captioning

Authors: Yuzhao Mao, Chang Zhou, Xiaojie Wang, Ruifan Li

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on both sentence and paragraph datasets demonstrate the effectiveness of our TOMS in terms of topical consistency and descriptive completeness.
Researcher Affiliation Academia Yuzhao Mao, Chang Zhou, Xiaojie Wang, Ruifan Li Center for Intelligence Science and Technology, School of Computer Science, Beijing University of Posts and Telecommunications {maoyuzhao,elani,xjwang,rfli}@bupt.edu.cn
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper cites external tools and frameworks (coco-caption, pytorch, neuraltalk2) but does not provide a link or statement for the source code of their proposed TOMS model.
Open Datasets Yes First are standard datasets, including Flickr8k [Hodosh et al., 2013], Flickr30k [Young et al., 2014] and COCO [Lin et al., 2014] for sentence level MS captioning and second is a paragraph dataset collected by [Krause et al., 2017] for paragraph level MS captioning.
Dataset Splits Yes The same preprocessing and data splits as previous works [Karpathy and Fei-Fei, 2015; Krause et al., 2017] are used in our experiments.
Hardware Specification No The paper does not explicitly describe the hardware used for experiments.
Software Dependencies No The paper mentions software like PyTorch, ResNet-152, coco-caption, and Stanford natural language parser, but does not provide specific version numbers for these dependencies.
Experiment Setup Yes We implement two layers LSTM with each hidden dimension of 512. Both topic and word embedding size are set 256. In FGU, topic and image are 512-dimensional vectors, and 1024 for context representations. Dropout is adopted in both input and output layer.