Hierarchical Recurrent Attention Network for Response Generation

Authors: Chen Xing, Yu Wu, Wei Wu, Yalou Huang, Ming Zhou

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical studies on both automatic evaluation and human judgment show that HRAN can significantly outperform state-of-the-art models for context based response generation.
Researcher Affiliation Collaboration Chen Xing,12 Yu Wu,3 Wei Wu,4 Yalou Huang,12 Ming Zhou4 1College of Computer and Control Engineering, Nankai University, Tianjin, China 2College of Software, Nankai University, Tianjin, China 3State Key Lab of Software Development Environment, Beihang University, Beijing, China 4 Microsoft Research, Beijing, China
Pseudocode No The paper describes the model architecture and mathematical formulations but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes We release our source code and data at https://github.com/Lynette Xing1991/HRAN.
Open Datasets Yes We built a data set from Douban Group [...]. The data will be publicly available.
Dataset Splits Yes From them, we randomly sampled 1 million conversations as training data, 10, 000 conversations as validation data, and 1, 000 conversations as test data, and made sure that there is no overlap among them.
Hardware Specification Yes All models were initialized with isotropic Gaussian distributions X N(0, 0.01) and trained with an Ada Delta algorithm (Zeiler 2012) on a NVIDIA Tesla K40 GPU.
Software Dependencies No The paper mentions using an 'Ada Delta algorithm' and 'Blocks4' with a GitHub link, but does not specify version numbers for Blocks or other software libraries.
Experiment Setup Yes In all models, we set the dimensionality of hidden states of encoders and decoders as 1000, and the dimensionality of word embedding as 620. All models were initialized with isotropic Gaussian distributions X N(0, 0.01) and trained with an Ada Delta algorithm (Zeiler 2012) on a NVIDIA Tesla K40 GPU. The batch size is 128. We set the initial learning rate as 1.0 and reduced it by half if the perplexity on validation began to increase.