reproducibilityindex.ai

Low-Resource Knowledge-Grounded Dialogue Generation

Authors: Xueliang Zhao, Wei Wu, Chongyang Tao, Can Xu, Dongyan Zhao, Rui Yan

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluation results on two benchmarks indicate that with only 1/8 training data, our model can achieve the state-of-the-art performance and generalize well on out-of-domain knowledge. We test the proposed model on Wizard of Wikipedia (Wizard) published in Dinan et al. (2019) and CMU Document Grounded Conversations (CMU Do G) published in Zhou et al. (2018b).
Researcher Affiliation	Collaboration	1Wangxuan Institute of Computer Technology, Peking University, Beijing, China 2Center for Data Science, AAIS, Peking University, Beijing, China 3Microsoft Corporation, Beijing, China 4Beijing Academy of Artiﬁcial Intelligence (BAAI), Beijing, China
Pseudocode	No	The paper describes its model components and their mathematical formulations (Equations 1-11) and provides an architecture diagram (Figure 1), but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper provides links to code for baseline implementations (e.g., "https://github.com/facebookresearch/Parl AI/blob/master/projects/wizard_of_wikipedia" for TMN, "https://github.com/lizekang/ITDD" for ITDD) and evaluation scripts (e.g., "https://github.com/Maluuba/nlg-eval"). However, it does not state that the authors' own source code for the proposed model is open-source or provide a link to it.
Open Datasets	Yes	We test the proposed model on Wizard of Wikipedia (Wizard) published in Dinan et al. (2019) and CMU Document Grounded Conversations (CMU Do G) published in Zhou et al. (2018b). We choose Reddit Conversation Corpus2 cleaned by Dziri et al. (2018) as DC. We use the Wikipedia dump published on Parl AI3 as DP .
Dataset Splits	Yes	The data is split as a training set, a validation set, and a test set by the data owner. (for Wizard) The data has been divided into a training set, a validation set, and a test set by the data owner. (for CMU Do G). Table 4 in Appendix A specifies: Wizard of Wikipedia Train 18,430, Valid 1,948, Test Seen 965, Test Unseen 968. CMU Do G Train 3,373, Valid 229, Test 619.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions using specific software components like "GloVe (Pennington et al., 2014)", "recurrent neural network with gated recurrent units (GRUs) (Chung et al., 2014)", "Adam (Kingma & Ba, 2015) optimizer", and "Gumbel-Softmax function (Jang et al., 2016)". However, it does not specify version numbers for these software dependencies or any other key libraries/frameworks used.
Experiment Setup	Yes	In both Wizard and CMU DOG, we set the size of word embedding as 300, the hidden size of the context encoder, the knowledge encoder, and the decoder as 1024. The context encoder and the decoder have 3 layers respectively. ...All models are learned with Adam (Kingma & Ba, 2015) optimizer with β1 = 0.9, β2 = 0.999, and an initial learning rate = 5e 4. We increase the learning rate linearly for the ﬁrst 5000 training steps and decrease it thereafter proportionally to the inverse square root of the step number. We set the initial temperature, the minimum temperature, and the anneal rate of gumbel softmax as 1.0, 0.6, and 4e 5 respectively. In training, we choose 64 as the size of mini-batches, and add dropout to gθs and MLPθv , but do not see much difference. Early stopping on validation is adopted as a regularization strategy. We employ beam search in response decoding with a beam size 5.