reproducibilityindex.ai

An Ensemble of Retrieval-Based and Generation-Based Human-Computer Conversation Systems

Authors: Yiping Song, Cheng-Te Li, Jian-Yun Nie, Ming Zhang, Dongyan Zhao, Rui Yan

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that such an ensemble system outperforms each single module by a large margin.
Researcher Affiliation	Academia	Institute of Network Computing and Information Systems, School of EECS, Peking University, China 2Department of Statistics, National Cheng Kung University, Taiwan 3University of Montreal, Canada 4Institute of Computer Science and Technology, Peking University, China {songyiping, mzhang cs, zhaody, ruiyan}@pku.edu.cn chengte@mail.ncku.edu.tw nie@iro.umontreal.ca
Pseudocode	No	The paper describes the model architecture and processes with text and diagrams but does not include any formal pseudocode or algorithm blocks.
Open Source Code	No	To train our neural models, we implement code based on dl4mt-tutorial6, and follow Shang et al. (2015) for hyperparameter settings as it generally works well in our model. 6https://github.com/nyu-dl/dl4mt-tutorial. The paper states that their code is based on a tutorial, but does not explicitly state that the code for their specific ensemble methodology is open-source or provided.
Open Datasets	Yes	To construct a database for information retrieval, we collected human-human utterances from massive online forums, microblogs, and question-answering communities, including Sina Weibo4 and Baidu Tieba.5 ... For the generation part, we use the dataset comprising 1,606,741 query-reply pairs originating from Baidu Tieba.
Dataset Splits	Yes	We randomly selected 1.5 million pairs for training and 100K pairs for validation. The left 6,741 pairs are used for testing both in generation part and the whole system. Table 2: Statistics of our datasets. Generator (Train) 1,500,000 Validation 100,000 Testing 6,741
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running its experiments.
Software Dependencies	No	To train our neural models, we implement code based on dl4mt-tutorial6, and follow Shang et al. (2015) for hyperparameter settings as it generally works well in our model. While the paper mentions using code based on a tutorial, it does not specify version numbers for any software dependencies like PyTorch, Python, or CUDA.
Experiment Setup	Yes	All the embeddings are set to 620-dimension and the hidden states are set to 1000-dimension. We apply Ada Delta with a minibatch ([Zeiler, 2012]) size of 80. Chinese word segmentation is performed on all utterances. We keep the set of 100k words for queries and 30K for the retrieval and generated replies due to efﬁciency concerns. The validation set is only used for early stop based on the perplexity measure.