Reinforcing Coherence for Sequence to Sequence Model in Dialogue Generation
Authors: Hainan Zhang, Yanyan Lan, Jiafeng Guo, Jun Xu, Xueqi Cheng
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on both Chinese Weibo dataset and English Subtitle dataset show that the proposed models produce more specific and meaningful responses, yielding better performances against Seq2Seq models in terms of both metric-based and human evaluations. |
| Researcher Affiliation | Academia | Hainan Zhang, Yanyan Lan, Jiafeng Guo, Jun Xu and Xueqi Cheng University of Chinese Academy of Sciences, Beijing, China CAS Key Lab of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences |
| Pseudocode | No | The paper describes processes and models using equations and descriptive text, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code for the described methodology, nor does it include a link to a code repository. |
| Open Datasets | Yes | We use two public datasets in our experiments. The Chinese Weibo dataset, named STC [Shang et al., 2015]... We also use an English conversation data, named Open Subtitles3 (OSDb) dataset... 3https://github.com/jiweil/Neural-Dialogue-Generation |
| Dataset Splits | Yes | We randomly split the data to training, validation, and testing sets, which contains 3,000,000, 388,571 and 400,000 pairs, respectively. We split the data to 3,000,000, 400,000 and 400,000 for training,validation and testing, respectively. |
| Hardware Specification | Yes | We run our model on a Tesla K80 GPU card with Tensorflow. |
| Software Dependencies | No | The paper mentions 'Tensorflow' as the software framework used for running models but does not provide a specific version number for it or any other software dependencies. |
| Experiment Setup | Yes | In the training process, the dimension is set to be 300, the size of negative sample is set to be 3, and the learning rate is 0.05. Then we introduce the settings on learning parameters in the deep architecture. For a fair comparison among all the baseline methods and our methods, the numbers of hidden nodes are all set to 300, and batch sizes are set to 200. Stochastic gradient decent (SGD) is utilized in our experiment for optimization, instead of Adam, because SGD yields better performances in our experiments. The learning rate is set to be 0.5, and adaptively decays with rate 0.99 in the optimization. |