Hierarchical Recurrent Attention Network for Response Generation
Authors: Chen Xing, Yu Wu, Wei Wu, Yalou Huang, Ming Zhou
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical studies on both automatic evaluation and human judgment show that HRAN can significantly outperform state-of-the-art models for context based response generation. |
| Researcher Affiliation | Collaboration | Chen Xing,12 Yu Wu,3 Wei Wu,4 Yalou Huang,12 Ming Zhou4 1College of Computer and Control Engineering, Nankai University, Tianjin, China 2College of Software, Nankai University, Tianjin, China 3State Key Lab of Software Development Environment, Beihang University, Beijing, China 4 Microsoft Research, Beijing, China |
| Pseudocode | No | The paper describes the model architecture and mathematical formulations but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We release our source code and data at https://github.com/Lynette Xing1991/HRAN. |
| Open Datasets | Yes | We built a data set from Douban Group [...]. The data will be publicly available. |
| Dataset Splits | Yes | From them, we randomly sampled 1 million conversations as training data, 10, 000 conversations as validation data, and 1, 000 conversations as test data, and made sure that there is no overlap among them. |
| Hardware Specification | Yes | All models were initialized with isotropic Gaussian distributions X N(0, 0.01) and trained with an Ada Delta algorithm (Zeiler 2012) on a NVIDIA Tesla K40 GPU. |
| Software Dependencies | No | The paper mentions using an 'Ada Delta algorithm' and 'Blocks4' with a GitHub link, but does not specify version numbers for Blocks or other software libraries. |
| Experiment Setup | Yes | In all models, we set the dimensionality of hidden states of encoders and decoders as 1000, and the dimensionality of word embedding as 620. All models were initialized with isotropic Gaussian distributions X N(0, 0.01) and trained with an Ada Delta algorithm (Zeiler 2012) on a NVIDIA Tesla K40 GPU. The batch size is 128. We set the initial learning rate as 1.0 and reduced it by half if the perplexity on validation began to increase. |