reproducibilityindex.ai

Exemplar Guided Neural Dialogue Generation

Authors: Hengyi Cai, Hongshen Chen, Yonghao Song, Xiaofang Zhao, Dawei Yin

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluations on a large-scale conversation dataset show that the proposed approach signiﬁcantly outperforms the state-of-the-art in terms of both the quantitative metrics and human evaluations.
Researcher Affiliation	Collaboration	1Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 2University of Chinese Academy of Sciences, Beijing, China 3Data Science Lab, JD.com, China 4Baidu Inc., China
Pseudocode	No	The paper provides schematic illustrations and algorithmic descriptions in text, but does not contain structured pseudocode or algorithm blocks that are clearly labeled.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets	Yes	To validate our model s eﬀectiveness, we construct an open-domain conversation corpus spanning over several public available dialogue dataset, including a movie discussions dataset collected from Reddit [Dodge et al., 2015], and a Ubuntu technical corpus [Lowe et al., 2015] discussing about the usage of Ubuntu.
Dataset Splits	Yes	57,402 context-response pairs are sampled for training, 3,000 for validation and 3,000 for testing.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, processor types, memory amounts) used for running the experiments.
Software Dependencies	No	Our model is implemented using Parl AI [Miller et al., 2017]. The Adam [Kingma and Ba, 2014] optimizer with a learning rate of 0.001 is used to train the models. The paper mentions software names but does not provide specific version numbers for them.
Experiment Setup	Yes	We truncate all context utterances to length 100 and response utterance to length 50. We take the most frequent 20,000 words as conventional vocabulary. Regarding model implementations, the RNNs in the encoder and the decoder utilize 2-layer LSTM structures with 256 hidden cells for each layer. The latent variable size is set to 64. The size of latent topics is set to 10. The dimensions of word embedding and topic embedding matrix are set to 300. Top-10 candidate exemplar responses are retrieved by the exemplar responses retriever in the ﬁrst round retrieving. The Adam [Kingma and Ba, 2014] optimizer with a learning rate of 0.001 is used to train the models. We use early stopping with log-likelihood on the validation set as the stopping criteria.