Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models

Authors: Iulian Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, Joelle Pineau

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the different variants of our HRED model, and compare against several alternatives, including basic n-gram models (Goodman 2001), a standard (non-hierarchical) RNN trained on the concatenation of the utterances in each triple, and a context-sensitive model (DCGM-I) recently proposed by Sordoni et al. (2015b).Our results are presented in Table 2.
Researcher Affiliation Academia Iulian V. Serban, Alessandro Sordoni, Yoshua Bengio, 1 Aaron Courville, Joelle Pineau, Department of Computer Science and Operations Research, Universit e de Montr eal, Montreal, Canada {iulian.vlad.serban,alessandro.sordoni,yoshua.bengio,aaron.courville} AT umontreal.ca School of Computer Science, Mc Gill University, Montreal, Canada jpineau AT cs.mcgill.ca
Pseudocode No The paper describes the model architecture and training procedures using text and equations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes The model implementations can be found on Git Hub: https://github.com/julianser/hed-dlg https://github.com/julianser/rnn-lm.
Open Datasets Yes We can further pretrain the model on a large non-dialogue corpus, which covers similar topics and types of interactions between interlocutors. One such corpus is the Q-A Sub Tle corpus containing about 5.5M Q-A pairs constructed from movie subtitles (Ameixa et al. 2014).
Dataset Splits Yes To avoid co-dependencies between triples coming from the same movie, we first split the movies into training, validation and test set, and then construct the triples. Statistics are reported in Table 1. Training Validation Test Movies 484 65 65 Triples 196,308 24,717 24,271 Avg. tokens/triple 53 53 55 Avg. unk/triple 0.97 1.22 1.19
Hardware Specification No The paper mentions 'GPU memory limitations' but does not specify any particular GPU model, CPU, or other hardware components used for the experiments. It lacks specific details such as 'NVIDIA A100' or 'Intel Xeon'.
Software Dependencies No The paper states 'Our implementation relied on the open-source Python library Theano (Bastien et al. 2012)' and mentions 'NLTK (Bird, Klein, and Loper 2009)' but does not provide specific version numbers for these software components (e.g., 'Theano 0.7' or 'Python 3.8').
Experiment Setup Yes To train the neural network models, we optimized the log-likelihood of the triples using the recently proposed Adam optimizer (Kingma and Ba 2014). Our implementation relied on the open-source Python library Theano (Bastien et al. 2012). The best hyperparameters of the models were chosen by early stopping with patience on the validation set perplexity (Bengio 2012). We initialized the recurrent parameter matrices as orthogonal matrices, and all other parameters from a Gaussian random distribution with mean zero and standard deviation 0.01. For the baseline RNN, we tested hidden state spaces dh = 200, 300 and 400. For HRED we experimented with encoder and decoder hidden state spaces of size 200, 300 and 400. Based on preliminary results and due to GPU memory limitations, we limited ourselves to size 300 when not bootstrapping or bootstrapping from Word2Vec, and to size 400 when bootstrapping from Sub Tle. Preliminary experiments showed that the context RNN state space at and above 300 performed similarly, so we fixed it at 300 when not bootstrapping or bootstrapping from Word2Vec, and to 1200 when bootstrapping from Sub Tle. For all models, we used word embedding of size 400 when bootstrapping from Sub Tle and of size 300 otherwise. To help generalization, we used the maxout activation function, between the hidden state and the projected word embeddings of the decoder RNN, when not bootstrapping and when bootstrapping from Word2Vec.