Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data

Authors: Michael Cogswell, Jiasen Lu, Rishabh Jain, Stefan Lee, Devi Parikh, Dhruv Batra

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present qualitative results, automated metrics, and human studies that all show our model can adapt to new tasks and maintain language quality. and Table 1: Performance of our models and baselines in different experimental settings.
Researcher Affiliation Collaboration 1Georgia Institute of Technology 2Oregon State University 3 Allen Institute for AI 4 Facebook AI Research 5 SRI International
Pseudocode Yes Predictor... using a softmax (see Algorithm 2 in the supplement for full details)
Open Source Code Yes Code has been made available at: https: //github.com/mcogswell/dialog_without_dialog.
Open Datasets Yes We leverage the VQAv2 [6] dataset as our language source to learn how to ask questions that humans can understand. and By default we use VQA images (i.e., from COCO [19]), but we also construct pools using CUB (bird) images [20] and AWA (animal) images [21].
Dataset Splits Yes We find 5 epochs stops training early enough to avoid overfitting on our val set. and Table 1 presents results on our val set for our model and baselines across the various settings described in Section 4.
Hardware Specification No The paper does not explicitly describe any specific hardware components (e.g., GPU models, CPU types, or cloud instances) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes In stage 2.A... This stage takes 20 epochs to train. Once Q-bot learns how to track dialog we update the entire planner in stage 2.B for 5 epochs.