Review Networks for Caption Generation

Authors: Zhilin Yang, Ye Yuan, Yuexin Wu, William W. Cohen, Russ R. Salakhutdinov

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we show that our framework improves over state-of-the-art encoder-decoder systems on the tasks of image captioning and source code captioning.
Researcher Affiliation Academia Zhilin Yang, Ye Yuan, Yuexin Wu, Ruslan Salakhutdinov, William W. Cohen School of Computer Science Carnegie Mellon University {zhiliny,yey1,yuexinw,rsalakhu,wcohen}@cs.cmu.edu
Pseudocode No The paper describes the model architecture and mathematical formulations through text and equations (e.g., Eq. 1, 2, 3) and provides architectural diagrams (Figure 1, Figure 2), but it does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Code and data available at https://github.com/kimiyoung/review_net.
Open Datasets Yes We evaluate our model on the MSCOCO benchmark dataset [2] for image captioning. The dataset contains 123,000 images with at least 5 captions for each image. For offline evaluation, we use the same data split as in [7, 20, 21]... We experiment with a benchmark dataset for source code captioning, Habeas Corpus [11]. Habeas Corpus collects nine popular open-source Java code repositories...
Dataset Splits Yes For offline evaluation, we use the same data split as in [7, 20, 21], where we reserve 5,000 images for development and test respectively and use the rest for training. ... We randomly sample 10% of the files as the test set, 10% as the development set, and use the rest for training.
Hardware Specification Yes Unlike these methods, our approach with the review network is a generic end-to-end encoder-decoder model and can be trained within six hours on a Titan X GPU.
Software Dependencies No The paper mentions specific CNN architectures like VGGNet [13] and Inception-v3 [16], and uses LSTM units. However, it does not provide specific version numbers for general software dependencies or libraries (e.g., 'TensorFlow 1.x' or 'Python 3.x').
Experiment Setup Yes We set the number of review steps Tr = 8, the weighting factor λ = 10.0, the dimension of word embeddings to be 100, the learning rate to be 1e 2, and the dimension of LSTM hidden states to be 1, 024. These hyperparameters are tuned on the development set. ... We set the number of review steps Tr = 8, the dimension of word embeddings to be 50, and the dimension of the LSTM hidden states to be 256.