Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation

Authors: Iulian Serban, Tim Klinger, Gerald Tesauro, Kartik Talamadupula, Bowen Zhou, Yoshua Bengio, Aaron Courville

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, the coarse sequences are extracted using automatic procedures, which are designed to capture compositional structure and semantics. We apply the models to dialogue response generation in the technical support domain and compare them with several competing models. The multiresolution recurrent neural networks outperform competing models by a substantial margin, achieving stateof-the-art results according to both a human evaluation study and automatic evaluation metrics.
Researcher Affiliation Collaboration Iulian Vlad Serban University of Montreal 2920 chemin de la Tour, Montr eal, QC, Canada
Pseudocode No No pseudocode or algorithm blocks are present in the paper.
Open Source Code Yes The pre-processed Ubuntu Dialogue Corpus and the coarse representations can be downloaded at http://www.iulianserban.com/ Files/Ubuntu Dialogue Corpus.zip and https://github.com/julianser/ Ubuntu-Multiresolution-Tools.
Open Datasets Yes The specific task we consider is technical support for the Ubuntu operating system; the data we use is the Ubuntu Dialogue Corpus developed by Lowe et al (2015). The pre-processed Ubuntu Dialogue Corpus and the coarse representations can be downloaded at http://www.iulianserban.com/ Files/Ubuntu Dialogue Corpus.zip and https://github.com/julianser/ Ubuntu-Multiresolution-Tools.
Dataset Splits No The models are trained using early stopping with patience based on the validation set log-likelihood. We choose model hyperparameters such as the number of hidden units, word embedding dimensionality, and learning rate based on the validation set log-likelihood. (No explicit split sizes or percentages are given).
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments are provided in the paper.
Software Dependencies No We implement all models in Theano (Theano Development Team 2016). (No specific version number for Theano is provided).
Experiment Setup Yes We train all models w.r.t. the log-likelihood or joint log-likelihood on the training set using Adam (Kingma and Ba 2015). The models are trained using early stopping with patience based on the validation set log-likelihood. We choose model hyperparameters such as the number of hidden units, word embedding dimensionality, and learning rate based on the validation set log-likelihood. We use gradient clipping to stop the parameters from exploding (Pascanu, Mikolov, and Bengio 2012). We define the 20,000 most frequent words as the vocabulary, and map all other words to a special unknown token. Based on several experiments, we fix the word embedding dimensionality to size 300 for all models. At test time, we use a beam search of size 5 for generating the model responses. The RNNLM model has 2000 hidden units... The HRED model has 500, 1000, and 500 hidden units... Mr RNN... has 1000, 1000, and 2000 hidden units respectively for the coarse-level encoder, context, and decoder RNNs. The natural language sub-model... has 500, 1000, and 2000 hidden units... The coarse prediction encoder GRU RNN has 500 hidden units.