Reinforcing Coherence for Sequence to Sequence Model in Dialogue Generation

Authors: Hainan Zhang, Yanyan Lan, Jiafeng Guo, Jun Xu, Xueqi Cheng

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on both Chinese Weibo dataset and English Subtitle dataset show that the proposed models produce more specific and meaningful responses, yielding better performances against Seq2Seq models in terms of both metric-based and human evaluations.
Researcher Affiliation Academia Hainan Zhang, Yanyan Lan, Jiafeng Guo, Jun Xu and Xueqi Cheng University of Chinese Academy of Sciences, Beijing, China CAS Key Lab of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences
Pseudocode No The paper describes processes and models using equations and descriptive text, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement about releasing source code for the described methodology, nor does it include a link to a code repository.
Open Datasets Yes We use two public datasets in our experiments. The Chinese Weibo dataset, named STC [Shang et al., 2015]... We also use an English conversation data, named Open Subtitles3 (OSDb) dataset... 3https://github.com/jiweil/Neural-Dialogue-Generation
Dataset Splits Yes We randomly split the data to training, validation, and testing sets, which contains 3,000,000, 388,571 and 400,000 pairs, respectively. We split the data to 3,000,000, 400,000 and 400,000 for training,validation and testing, respectively.
Hardware Specification Yes We run our model on a Tesla K80 GPU card with Tensorflow.
Software Dependencies No The paper mentions 'Tensorflow' as the software framework used for running models but does not provide a specific version number for it or any other software dependencies.
Experiment Setup Yes In the training process, the dimension is set to be 300, the size of negative sample is set to be 3, and the learning rate is 0.05. Then we introduce the settings on learning parameters in the deep architecture. For a fair comparison among all the baseline methods and our methods, the numbers of hidden nodes are all set to 300, and batch sizes are set to 200. Stochastic gradient decent (SGD) is utilized in our experiment for optimization, instead of Adam, because SGD yields better performances in our experiments. The learning rate is set to be 0.5, and adaptively decays with rate 0.99 in the optimization.