Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data
Authors: Michael Cogswell, Jiasen Lu, Rishabh Jain, Stefan Lee, Devi Parikh, Dhruv Batra
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present qualitative results, automated metrics, and human studies that all show our model can adapt to new tasks and maintain language quality. and Table 1: Performance of our models and baselines in different experimental settings. |
| Researcher Affiliation | Collaboration | 1Georgia Institute of Technology 2Oregon State University 3 Allen Institute for AI 4 Facebook AI Research 5 SRI International |
| Pseudocode | Yes | Predictor... using a softmax (see Algorithm 2 in the supplement for full details) |
| Open Source Code | Yes | Code has been made available at: https: //github.com/mcogswell/dialog_without_dialog. |
| Open Datasets | Yes | We leverage the VQAv2 [6] dataset as our language source to learn how to ask questions that humans can understand. and By default we use VQA images (i.e., from COCO [19]), but we also construct pools using CUB (bird) images [20] and AWA (animal) images [21]. |
| Dataset Splits | Yes | We find 5 epochs stops training early enough to avoid overfitting on our val set. and Table 1 presents results on our val set for our model and baselines across the various settings described in Section 4. |
| Hardware Specification | No | The paper does not explicitly describe any specific hardware components (e.g., GPU models, CPU types, or cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | In stage 2.A... This stage takes 20 epochs to train. Once Q-bot learns how to track dialog we update the entire planner in stage 2.B for 5 epochs. |