Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation
Authors: Iulian Serban, Tim Klinger, Gerald Tesauro, Kartik Talamadupula, Bowen Zhou, Yoshua Bengio, Aaron Courville
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, the coarse sequences are extracted using automatic procedures, which are designed to capture compositional structure and semantics. We apply the models to dialogue response generation in the technical support domain and compare them with several competing models. The multiresolution recurrent neural networks outperform competing models by a substantial margin, achieving stateof-the-art results according to both a human evaluation study and automatic evaluation metrics. |
| Researcher Affiliation | Collaboration | Iulian Vlad Serban University of Montreal 2920 chemin de la Tour, Montr eal, QC, Canada |
| Pseudocode | No | No pseudocode or algorithm blocks are present in the paper. |
| Open Source Code | Yes | The pre-processed Ubuntu Dialogue Corpus and the coarse representations can be downloaded at http://www.iulianserban.com/ Files/Ubuntu Dialogue Corpus.zip and https://github.com/julianser/ Ubuntu-Multiresolution-Tools. |
| Open Datasets | Yes | The specific task we consider is technical support for the Ubuntu operating system; the data we use is the Ubuntu Dialogue Corpus developed by Lowe et al (2015). The pre-processed Ubuntu Dialogue Corpus and the coarse representations can be downloaded at http://www.iulianserban.com/ Files/Ubuntu Dialogue Corpus.zip and https://github.com/julianser/ Ubuntu-Multiresolution-Tools. |
| Dataset Splits | No | The models are trained using early stopping with patience based on the validation set log-likelihood. We choose model hyperparameters such as the number of hidden units, word embedding dimensionality, and learning rate based on the validation set log-likelihood. (No explicit split sizes or percentages are given). |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments are provided in the paper. |
| Software Dependencies | No | We implement all models in Theano (Theano Development Team 2016). (No specific version number for Theano is provided). |
| Experiment Setup | Yes | We train all models w.r.t. the log-likelihood or joint log-likelihood on the training set using Adam (Kingma and Ba 2015). The models are trained using early stopping with patience based on the validation set log-likelihood. We choose model hyperparameters such as the number of hidden units, word embedding dimensionality, and learning rate based on the validation set log-likelihood. We use gradient clipping to stop the parameters from exploding (Pascanu, Mikolov, and Bengio 2012). We define the 20,000 most frequent words as the vocabulary, and map all other words to a special unknown token. Based on several experiments, we fix the word embedding dimensionality to size 300 for all models. At test time, we use a beam search of size 5 for generating the model responses. The RNNLM model has 2000 hidden units... The HRED model has 500, 1000, and 500 hidden units... Mr RNN... has 1000, 1000, and 2000 hidden units respectively for the coarse-level encoder, context, and decoder RNNs. The natural language sub-model... has 500, 1000, and 2000 hidden units... The coarse prediction encoder GRU RNN has 500 hidden units. |