reproducibilityindex.ai

Partially-Supervised Image Captioning

Authors: Peter Anderson, Stephen Gould, Mark Johnson

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Applying our approach to an existing neural captioning model, we achieve state of the art results on the novel object captioning task using the COCO dataset. We further show that we can train a captioning model to describe new visual concepts from the Open Images dataset while maintaining competitive COCO evaluation scores.
Researcher Affiliation	Academia	Peter Anderson Macquarie University Sydney, Australia p.anderson@mq.edu.au [...] Stephen Gould Australian National University Canberra, Australia stephen.gould@anu.edu.au [...] Mark Johnson Macquarie University Sydney, Australia mark.johnson@mq.edu.au [...] Now at Georgia Tech (peter.anderson@gatech.edu)
Pseudocode	Yes	Algorithm 1 Beam search decoding [...] Algorithm 2 Constrained beam search decoding [13]
Open Source Code	Yes	To encourage future work, we have released our code and trained models via the project website2. [Footnote 2]: www.panderson.me/constrained-beam-search
Open Datasets	Yes	We use the COCO 2014 captions dataset [52] containing 83K training images and 41K validation images, each labeled with ﬁve human-annotated captions. [...] object annotation labels for 25 additional animal classes from the Open Images V4 dataset [14].
Dataset Splits	Yes	We use the splits proposed by Hendricks et al. [21] for novel object captioning, in which all images with captions that mention one of eight selected objects (including synonyms and plural forms) are removed from the caption training set, which is reduced to 70K images. The original COCO validation set is split 50% for validation and 50% for testing.
Hardware Specification	Yes	Training (after initialization) takes around 8 hours using two Titan X GPUs.
Software Dependencies	No	The paper mentions software like 'Faster R-CNN object detector', 'Res Net-101 CNN', 'Long Short-Term Memory (LSTM) network', and 'Glo Ve', but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	When training on image labels, we use the online version of our proposed training algorithm, constructing each minibatch of 100 with an equal number of complete and partially-speciﬁed training examples. We use SGD with an initial learning rate of 0.001, decayed to zero over 5K iterations, with a lower learning rate for the pre-trained word embeddings. In beam search and constrained beam search decoding we use a beam size of 5.