Generative Visual Dialogue System via Weighted Likelihood Estimation
Authors: Heming Zhang, Shalini Ghosh, Larry Heck, Stephen Walsh, Junting Zhang, Jie Zhang, C.-C. Jay Kuo
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results on the Vis Dial benchmark demonstrate the superiority of our proposed algorithm over other state-of-the-art approaches, with an improvement of 5.81% on recall@10. |
| Researcher Affiliation | Collaboration | 1University of Southern California 2Samsung Research America 3Arizona State University |
| Pseudocode | No | The paper describes methods and equations, but it does not include a distinct pseudocode block or algorithm section. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | Yes | We evaluate our proposed model on the Vis Dial dataset [Das et al., 2017]. In Vis Dial v0.9, on which most previous work has benchmarked, there are in total 83k and 40k dialogues on COCO-train and COCO-val images, respectively. |
| Dataset Splits | Yes | We follow the methodology in [Lu et al., 2017] and split the data into 82k for train, 1k for val and 40k for test. In the new version Vis Dial v1.0, which was used for the Visual Dialog Challenge 2018, train consists of the previous 123k images and corresponding dialogues. 2k and 8k images with dialogues are collected for val and test, respectively. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU models or CPU specifications. It mentions 'pre-trained CNN models (VGG, Res Net)' which are software models, not hardware. |
| Software Dependencies | No | The paper mentions software components like 'LSTM decoder' and 'Adam optimizer' but does not provide specific version numbers for these or other libraries/frameworks (e.g., PyTorch, TensorFlow version). |
| Experiment Setup | Yes | We use 512D word embeddings, which are trained from scratch and shared by question, dialogue history and decoder LSTMs. We also set all LSTMs to have single layer with 512D hidden state for consistency with other works. We use the Adam optimizer with the base learning rate of 4 10 4. |