Low-Resource Knowledge-Grounded Dialogue Generation

Authors: Xueliang Zhao, Wei Wu, Chongyang Tao, Can Xu, Dongyan Zhao, Rui Yan

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluation results on two benchmarks indicate that with only 1/8 training data, our model can achieve the state-of-the-art performance and generalize well on out-of-domain knowledge. We test the proposed model on Wizard of Wikipedia (Wizard) published in Dinan et al. (2019) and CMU Document Grounded Conversations (CMU Do G) published in Zhou et al. (2018b).
Researcher Affiliation Collaboration 1Wangxuan Institute of Computer Technology, Peking University, Beijing, China 2Center for Data Science, AAIS, Peking University, Beijing, China 3Microsoft Corporation, Beijing, China 4Beijing Academy of Artificial Intelligence (BAAI), Beijing, China
Pseudocode No The paper describes its model components and their mathematical formulations (Equations 1-11) and provides an architecture diagram (Figure 1), but it does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper provides links to code for baseline implementations (e.g., "https://github.com/facebookresearch/Parl AI/blob/master/projects/wizard_of_wikipedia" for TMN, "https://github.com/lizekang/ITDD" for ITDD) and evaluation scripts (e.g., "https://github.com/Maluuba/nlg-eval"). However, it does not state that the authors' own source code for the proposed model is open-source or provide a link to it.
Open Datasets Yes We test the proposed model on Wizard of Wikipedia (Wizard) published in Dinan et al. (2019) and CMU Document Grounded Conversations (CMU Do G) published in Zhou et al. (2018b). We choose Reddit Conversation Corpus2 cleaned by Dziri et al. (2018) as DC. We use the Wikipedia dump published on Parl AI3 as DP .
Dataset Splits Yes The data is split as a training set, a validation set, and a test set by the data owner. (for Wizard) The data has been divided into a training set, a validation set, and a test set by the data owner. (for CMU Do G). Table 4 in Appendix A specifies: Wizard of Wikipedia Train 18,430, Valid 1,948, Test Seen 965, Test Unseen 968. CMU Do G Train 3,373, Valid 229, Test 619.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions using specific software components like "GloVe (Pennington et al., 2014)", "recurrent neural network with gated recurrent units (GRUs) (Chung et al., 2014)", "Adam (Kingma & Ba, 2015) optimizer", and "Gumbel-Softmax function (Jang et al., 2016)". However, it does not specify version numbers for these software dependencies or any other key libraries/frameworks used.
Experiment Setup Yes In both Wizard and CMU DOG, we set the size of word embedding as 300, the hidden size of the context encoder, the knowledge encoder, and the decoder as 1024. The context encoder and the decoder have 3 layers respectively. ...All models are learned with Adam (Kingma & Ba, 2015) optimizer with β1 = 0.9, β2 = 0.999, and an initial learning rate = 5e 4. We increase the learning rate linearly for the first 5000 training steps and decrease it thereafter proportionally to the inverse square root of the step number. We set the initial temperature, the minimum temperature, and the anneal rate of gumbel softmax as 1.0, 0.6, and 4e 5 respectively. In training, we choose 64 as the size of mini-batches, and add dropout to gθs and MLPθv , but do not see much difference. Early stopping on validation is adopted as a regularization strategy. We employ beam search in response decoding with a beam size 5.