Diverse Image Captioning with Context-Object Split Latent Spaces

Authors: Shweta Mahajan, Stefan Roth

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our COS-CVAE approach on the standard COCO dataset and on the held-out COCO dataset consisting of images with novel objects, showing significant gains in accuracy and diversity. 4 Experiments To show the advantages of our method for diverse and accurate image captioning, we perform experiments on the COCO dataset [29]
Researcher Affiliation Academia Shweta Mahajan Stefan Roth Dept. of Computer Science, TU Darmstadt {mahajan@aiphes, stefan.roth@visinf}.tu-darmstadt.de
Pseudocode No The paper presents architectural diagrams (Fig. 2) and descriptive text for the approach but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes 1Code available at https://github.com/visinf/cos-cvae
Open Datasets Yes We evaluate our COS-CVAE approach on the standard COCO dataset and on the held-out COCO dataset consisting of images with novel objects, showing significant gains in accuracy and diversity. perform experiments on the COCO dataset [29], consisting of 82 783 training and 40 504 validation images, each with five captions. We additionally perform experiments on the held-out COCO dataset [17]
Dataset Splits Yes Consistent with [6, 14, 44], we use 118 287 train, 4000 validation, and 1000 test images. We additionally perform experiments on the held-out COCO dataset [17] to show that our COS-CVAE framework can be extended to training on images with novel objects. This dataset is a subset of the COCO dataset and excludes all the image-text pairs containing at least one of the eight specific objects (in any one of the human annotations) in COCO: bottle , bus , couch , microwave , pizza , racket , suitcase , and zebra . The training set consists of 70 000 images. For this setting, COCO validation [29] is split into two equal halves for validation and test data.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or library versions).
Experiment Setup No The paper states "We consider 20 and 100 samples of z, consistent with prior work." and "More details can be found in the supplemental material." but does not provide specific hyperparameters (e.g., learning rate, batch size, number of epochs) or system-level training settings in the main text.