Show and Tell More: Topic-Oriented Multi-Sentence Image Captioning
Authors: Yuzhao Mao, Chang Zhou, Xiaojie Wang, Ruifan Li
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on both sentence and paragraph datasets demonstrate the effectiveness of our TOMS in terms of topical consistency and descriptive completeness. |
| Researcher Affiliation | Academia | Yuzhao Mao, Chang Zhou, Xiaojie Wang, Ruifan Li Center for Intelligence Science and Technology, School of Computer Science, Beijing University of Posts and Telecommunications {maoyuzhao,elani,xjwang,rfli}@bupt.edu.cn |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper cites external tools and frameworks (coco-caption, pytorch, neuraltalk2) but does not provide a link or statement for the source code of their proposed TOMS model. |
| Open Datasets | Yes | First are standard datasets, including Flickr8k [Hodosh et al., 2013], Flickr30k [Young et al., 2014] and COCO [Lin et al., 2014] for sentence level MS captioning and second is a paragraph dataset collected by [Krause et al., 2017] for paragraph level MS captioning. |
| Dataset Splits | Yes | The same preprocessing and data splits as previous works [Karpathy and Fei-Fei, 2015; Krause et al., 2017] are used in our experiments. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for experiments. |
| Software Dependencies | No | The paper mentions software like PyTorch, ResNet-152, coco-caption, and Stanford natural language parser, but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | We implement two layers LSTM with each hidden dimension of 512. Both topic and word embedding size are set 256. In FGU, topic and image are 512-dimensional vectors, and 1024 for context representations. Dropout is adopted in both input and output layer. |