reproducibilityindex.ai

Open Domain Dialogue Generation with Latent Images

Authors: Ze Yang, Wei Wu, Huang Hu, Can Xu, Wei Wang, Zhoujun Li14239-14247

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical studies are conducted in both image-grounded conversation and text-based conversation.
Researcher Affiliation	Collaboration	1 State Key Lab of Software Development Environment, Beihang University, Beijing, China 2 Meituan, Beijing, China 3 Microsoft, Beijing, China 4 China Resources Group, Shenzhen, China
Pseudocode	No	The paper describes the models and their components, and includes a model architecture diagram (Figure 2), but does not provide any pseudocode or algorithm blocks.
Open Source Code	No	The paper provides links to evaluation scripts and baseline implementations (e.g., 'https://github.com/Maluuba/nlg-eval', 'https://github.com/IBM/pytorch-seq2seq'), but no explicit statement or link for the source code of the authors' proposed method (IMGVAE).
Open Datasets	Yes	For image-grounded dialogue set DI, we choose Image-Chat data published in (Shuster et al. 2020)... For the textual dialogue set DT , we use the Reddit Conversation Corpus1 published by (Dziri et al. 2018)
Dataset Splits	Yes	The training/validation/test sets are split into 186,782/5,000/9,997 respectively... we randomly sample 1M/20K/20K dialogues as the training/validation/test set of the Reddit data.
Hardware Specification	Yes	Our model is trained on 4 Tesla 32GB P40 GPUs in a data-parallel manner with batch size 100.
Software Dependencies	No	The paper mentions using the Adam algorithm and implies the use of PyTorch through baseline implementations, but does not specify version numbers for any software dependencies used in their own model's implementation.
Experiment Setup	Yes	In both tasks, d1, d2, d3, and d4 are set as 512, 48, 768, and 300 respectively. The image reconstructor has 2 attentional visual reﬁners (i.e. m = 2), and the number of image sub-regions N0 and N1 are set as 64 64 and 128 128 respectively. The dimension of ϵ and the dimension of the augmented conditioning vector are set as 100. ...We learn all models using Adam algorithm (Kingma and Ba 2015) and the learning rates for image reconstructor and response generator are set as 1 10 4 and 1 10 3 respectively.