reproducibilityindex.ai

Teaching Machines to Describe Images with Natural Language Feedback

Authors: huan ling, Sanja Fidler

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose a hierarchical phrase-based RNN as our image captioning model, and design a feedback network that provides reward to the learner by conditioning on the human-provided feedback. We show that by exploiting descriptive feedback on new images our model learns to perform better than when given human written captions on these images.
Researcher Affiliation	Academia	Huan Ling1, Sanja Fidler1,2 University of Toronto1, Vector Institute2 {linghuan,fidler}@cs.toronto.edu
Pseudocode	No	The paper describes the model computationally with equations and function definitions, but does not include a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	Our code and data will be released (http://www.cs.toronto.edu/~linghuan/feedbackImageCaption/) to facilitate more human-like training of captioning models.
Open Datasets	Yes	To train our hierarchical model, we ﬁrst process MS-COCO image caption data [20] using the Stanford Core NLP toolkit [23].
Dataset Splits	Yes	We use 82K images for training, 2K for validation, and 4K for testing. In particular, we randomly chose 2K val and 4K test images from the ofﬁcial validation split.
Hardware Specification	No	The paper mentions 'NVIDIA for their donation of the GPUs' in the acknowledgment section, but does not specify the exact GPU models, CPU, or other hardware components used for experiments.
Software Dependencies	No	The paper mentions tools like 'Stanford Core NLP toolkit' and 'ADAM optimizer' but does not specify version numbers for any software dependencies.
Experiment Setup	Yes	We use the ADAM optimizer [9] with learning rate 0.001. We use Adam with learning rate 1e 6 and batch size 50. As in [29], we follow an annealing schedule. We ﬁrst optimize the cross entropy loss for the ﬁrst K epochs, then for the following t = 1, . . . , T epochs, we use cross entropy loss for the ﬁrst (P floor(t/m)) phrases (where P denotes the number of phrases), and the policy gradient algorithm for the remaining floor(t/m) phrases. We choose m = 5.