reproducibilityindex.ai

Generative Visual Dialogue System via Weighted Likelihood Estimation

Authors: Heming Zhang, Shalini Ghosh, Larry Heck, Stephen Walsh, Junting Zhang, Jie Zhang, C.-C. Jay Kuo

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experimental results on the Vis Dial benchmark demonstrate the superiority of our proposed algorithm over other state-of-the-art approaches, with an improvement of 5.81% on recall@10.
Researcher Affiliation	Collaboration	1University of Southern California 2Samsung Research America 3Arizona State University
Pseudocode	No	The paper describes methods and equations, but it does not include a distinct pseudocode block or algorithm section.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets	Yes	We evaluate our proposed model on the Vis Dial dataset [Das et al., 2017]. In Vis Dial v0.9, on which most previous work has benchmarked, there are in total 83k and 40k dialogues on COCO-train and COCO-val images, respectively.
Dataset Splits	Yes	We follow the methodology in [Lu et al., 2017] and split the data into 82k for train, 1k for val and 40k for test. In the new version Vis Dial v1.0, which was used for the Visual Dialog Challenge 2018, train consists of the previous 123k images and corresponding dialogues. 2k and 8k images with dialogues are collected for val and test, respectively.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments, such as GPU models or CPU specifications. It mentions 'pre-trained CNN models (VGG, Res Net)' which are software models, not hardware.
Software Dependencies	No	The paper mentions software components like 'LSTM decoder' and 'Adam optimizer' but does not provide specific version numbers for these or other libraries/frameworks (e.g., PyTorch, TensorFlow version).
Experiment Setup	Yes	We use 512D word embeddings, which are trained from scratch and shared by question, dialogue history and decoder LSTMs. We also set all LSTMs to have single layer with 512D hidden state for consistency with other works. We use the Adam optimizer with the base learning rate of 4 10 4.