Review Networks for Caption Generation
Authors: Zhilin Yang, Ye Yuan, Yuexin Wu, William W. Cohen, Russ R. Salakhutdinov
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we show that our framework improves over state-of-the-art encoder-decoder systems on the tasks of image captioning and source code captioning. |
| Researcher Affiliation | Academia | Zhilin Yang, Ye Yuan, Yuexin Wu, Ruslan Salakhutdinov, William W. Cohen School of Computer Science Carnegie Mellon University {zhiliny,yey1,yuexinw,rsalakhu,wcohen}@cs.cmu.edu |
| Pseudocode | No | The paper describes the model architecture and mathematical formulations through text and equations (e.g., Eq. 1, 2, 3) and provides architectural diagrams (Figure 1, Figure 2), but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and data available at https://github.com/kimiyoung/review_net. |
| Open Datasets | Yes | We evaluate our model on the MSCOCO benchmark dataset [2] for image captioning. The dataset contains 123,000 images with at least 5 captions for each image. For offline evaluation, we use the same data split as in [7, 20, 21]... We experiment with a benchmark dataset for source code captioning, Habeas Corpus [11]. Habeas Corpus collects nine popular open-source Java code repositories... |
| Dataset Splits | Yes | For offline evaluation, we use the same data split as in [7, 20, 21], where we reserve 5,000 images for development and test respectively and use the rest for training. ... We randomly sample 10% of the files as the test set, 10% as the development set, and use the rest for training. |
| Hardware Specification | Yes | Unlike these methods, our approach with the review network is a generic end-to-end encoder-decoder model and can be trained within six hours on a Titan X GPU. |
| Software Dependencies | No | The paper mentions specific CNN architectures like VGGNet [13] and Inception-v3 [16], and uses LSTM units. However, it does not provide specific version numbers for general software dependencies or libraries (e.g., 'TensorFlow 1.x' or 'Python 3.x'). |
| Experiment Setup | Yes | We set the number of review steps Tr = 8, the weighting factor λ = 10.0, the dimension of word embeddings to be 100, the learning rate to be 1e 2, and the dimension of LSTM hidden states to be 1, 024. These hyperparameters are tuned on the development set. ... We set the number of review steps Tr = 8, the dimension of word embeddings to be 50, and the dimension of the LSTM hidden states to be 256. |