A Document-grounded Matching Network for Response Selection in Retrieval-based Chatbots

Authors: Xueliang Zhao, Chongyang Tao, Wei Wu, Can Xu, Dongyan Zhao, Rui Yan

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical studies on two public data sets indicate that DGMN can significantly improve upon state-of-the-art methods and at the same time enjoys good interpretability. We conduct experiments on two public data sets.
Researcher Affiliation Collaboration Xueliang Zhao 1 , Chongyang Tao 1 , Wei Wu2 , Can Xu2 , Dongyan Zhao1,3 and Rui Yan1,3 1Institute of Computer Science and Technology, Peking University, Beijing, China 2Microsoft Corporation, Beijing, China 3Center for Data Science, Peking University, Beijing, China {xl.zhao,chongyangtao,zhaody,ruiyan}@pku.edu.cn, {wuwei,caxu}@microsoft.com
Pseudocode No No explicitly labeled 'Pseudocode' or 'Algorithm' block was found. The paper describes the model components and equations but not in a pseudocode format.
Open Source Code No The paper states: 'All baseline models are implemented with the code shared at https://github.com/facebookresearch/Parl AI/tree/ master/projects/personachat and tuned on the validation sets.' This refers to the code for *baseline* models, not the DGMN model developed in this paper. No statement or link for the authors' own source code was provided.
Open Datasets Yes The first data we use is the PERSONA-CHAT data set published in [Zhang et al., 2018]. In addition to PERSONA-CHAT, we also conduct experiments with CMUDo G data set published recently in [Zhou et al., 2018a].
Dataset Splits Yes The data is split as a training set, a validation set, and a test set by the publishers. The data has been divided into a training set, a validation set, and a test set by the publishers. More statistics of the three sets are shown in Table 2. Early stopping on validation data is adopted as a regularization strategy.
Hardware Specification No No specific hardware details such as GPU models, CPU types, or cloud instance specifications were mentioned for running the experiments.
Software Dependencies No The paper mentions 'Adam [Kingma and Ba, 2015] optimizer' and 'Glove [Pennington et al., 2014]' for pre-trained embeddings, but does not provide specific version numbers for these or other software libraries/frameworks (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup Yes We set the size of word embedding as 300. In PERSONACHAT, the number of sentences per document is limited to 5 (i.e., m 5). For each sentence in a document, each utterance in a context, and each response candidate, if the number of words is less than 20, we pad zeros, otherwise, we keep the latest 20 words (i.e., lu = lr = ld = 20). In CMUDo G, we set m 20 and lu = lr = ld = 40 following the same procedure. In the matching layer of DGMN, the number of filters of CNN is set as 16, and the window sizes of convolution and pooling are both 3. All models are learned using Adam [Kingma and Ba, 2015] optimizer with a learning rate of 0.0001. In training, we choose 32 as the size of mini-batches. Early stopping on validation data is adopted as a regularization strategy.