Topic Aware Neural Response Generation
Authors: Chen Xing, Wei Wu, Yu Wu, Jie Liu, Yalou Huang, Ming Zhou, Wei-Ying Ma
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical studies on both automatic evaluation metrics and human annotations show that TA-Seq2Seq can generate more informative and interesting responses, significantly outperforming state-of-the-art response generation models. |
| Researcher Affiliation | Collaboration | Chen Xing,1,2 Wei Wu,4 Yu Wu,3 Jie Liu,1,2 Yalou Huang,1,2 Ming Zhou,4 Wei-Ying Ma4 1College of Computer and Control Engineering, Nankai University, Tianjin, China 2College of Software, Nankai University, Tianjin, China 3State Key Lab of Software Development Environment, Beihang University, Beijing, China 4Microsoft Research, Beijing, China {v-chxing, wuwei, v-wuyu, mingzhou, wyma}@microsoft.com {jliu,huangyl}@nankai.edu.cn |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We implemented the models with an open source deep learning tool Blocks4, and shared the code of our model at https: //github.com/Lynette Xing1991. |
| Open Datasets | No | We build a data set from Baidu Tieba which is the largest Chinese forum allowing users to post and comment on others posts. We crawl 20 million post-comment pairs and used them to simulate message-response pairs in conversation... In our experiments, we train a Twitter LDA model using large scale posts from Sina Weibo which is the largest microblogging service in China. The paper describes the data collection but does not provide concrete access information (link, DOI, repository, or formal citation with authors/year) for the specific datasets used in the experiments. |
| Dataset Splits | Yes | After this preprocessing, there are 15, 209, 588 pairs left. From them, we randomly sample 5 million distinct message-response pairs3 as training data, 10, 000 distinct pairs as validation data, and 1, 000 distinct messages with their responses as test data. |
| Hardware Specification | Yes | All models were initialized with isotropic Gaussian distributions X N(0, 0.01) and trained with an Ada Delta algorithm (Zeiler 2012) on a NVIDIA Tesla K40 GPU. |
| Software Dependencies | No | We implemented the models with an open source deep learning tool Blocks4. The Stanford Chinese word segmenter is also mentioned. However, specific version numbers for these software components are not provided. |
| Experiment Setup | Yes | We set the number of topics T as 200 and the hyperparameters of Twitter LDA as α = 1/T, β = 0.01, γ = 0.01... We set the dimensions of the hidden states of the encoder and the decoder as 1000, and the dimensions of word embeddings as 620. All models were initialized with isotropic Gaussian distributions X N(0, 0.01)... The batch size is 128. We set the initial learning rate as 1.0 and reduced it by half if the perplexity on validation began to increase. |