SentiCap: Generating Image Descriptions with Sentiments
Authors: Alexander Mathews, Lexing Xie, Xuming He
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the captions with different automatic and crowd-sourcing metrics. Our model compares favourably in common quality metrics for image captioning. In 84.6% of cases the generated positive captions were judged as being at least as descriptive as the factual captions. Of these positive captions 88% were confirmed by the crowd-sourced workers as having the appropriate sentiment. Trained on 2000+ sentimental captions and 413K neutral captions, our switching RNN outperforms a range of heuristic and learned baselines in the number of emotional captions generated, and in a number of caption evaluation metrics. |
| Researcher Affiliation | Collaboration | Alexander Mathews , Lexing Xie , Xuming He The Australian National University, NICTA alex.mathews@anu.edu.au, lexing.xie@anu.edu.au, xuming.he@nicta.com.au |
| Pseudocode | No | The paper includes equations and diagrams but no structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions releasing a list of ANPs and captions for their dataset (Section 4, footnote 1: "http://users.cecs.anu.edu.au/ u4534172/senticap.html") but does not explicitly state or link to the source code for the Senti Cap methodology itself. |
| Open Datasets | Yes | We build upon the CNN+RNN (Convolution Neural Network + Recurrent Neural Network) recipe that has seen many recent successes (Donahue et al. 2015; Karpathy and Fei-Fei 2015; Mao et al. 2015; Vinyals et al. 2015; Xu et al. 2015a). We have gathered a new dataset of several thousand captions with positive and negative sentiments by re-writing factual descriptions (Section 4). Trained on 2000+ sentimental captions and 413K neutral captions... We build upon Visual Senti Bank to construct sentiment vocabulary, but to the best of our knowledge, no existing work tries to compose image descriptions with desired sentiments. We expand the Visual Senti Bank (Borth et al. 2013) vocabulary with a set of ANPs from the YFCC100M image captions (Thomee et al. 2015) as the overlap between the original Senti Bank ANPs and the MSCOCO images is insufficient. |
| Dataset Splits | Yes | The background RNN is learned on the MSCOCO training set (Chen et al. 2015) of 413K+ sentences on 82K+ images. We construct an additional set of caption with sentiments as described in Section 4 using images from the MSCOCO validation partition. The POS subset contains 2,873 positive sentences and 998 images for training, and another 2,019 sentences over 673 images for testing. The NEG subset contains 2,468 negative sentences and 997 images for training, and another 1,509 sentences over 503 images for testing. We automatically search for the hyperparameters λθ, λη and λγ on a validation set using Whetlab (Snoek, Larochelle, and Adams 2012). |
| Hardware Specification | Yes | We implementd the system on a multicore workstation with an Nvidia K40 GPU. |
| Software Dependencies | No | We implement RNNs with LSTM units using the Theano package (Bastien et al. 2012). The paper mentions the software package 'Theano' but does not specify its version number. |
| Experiment Setup | Yes | Mini-batches of size 128 are used with a fixed momentum of 0.99 and a fixed learning rate of 0.001. Gradients are clipped to the range [ 5, 5] for all weights during back-propagation. We use perplexity as our stopping criteria. We automatically search for the hyperparameters λθ, λη and λγ on a validation set using Whetlab (Snoek, Larochelle, and Adams 2012). |