A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues
Authors: Iulian Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron Courville, Yoshua Bengio
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply the proposed model to the task of dialogue response generation and compare it with other recent neural-network architectures. We evaluate the model performance through a human evaluation study. The experiments demonstrate that our model improves upon recently proposed models and that the latent variables facilitate both the generation of meaningful, long and diverse responses and maintaining dialogue state. |
| Researcher Affiliation | Collaboration | Iulian Vlad Serban University of Montreal 2920 chemin de la Tour, Montr eal, QC, Canada Alessandro Sordoni Maluuba Inc 2000 Rue Peel, Montr eal, QC, Canada Ryan Lowe Mc Gill University 3480 Rue University, Montr eal, QC, Canada Laurent Charlin HEC Montr eal 3000 chemin de la Cˆote-Sainte Catherine, Montr eal, QC, Canada Joelle Pineau Mc Gill University 3480 Rue University, Montr eal, QC, Canada Aaron Courville and Yoshua Bengio University of Montreal 2920 chemin de la Tour, Montr eal, QC, Canada |
| Pseudocode | No | No pseudocode or clearly labeled algorithm blocks were found in the paper. |
| Open Source Code | No | The paper states "The Twitter tweet IDs will be made available upon publication.", which refers to data, not the source code for the methodology. No other concrete access to source code for the described methods is provided. |
| Open Datasets | Yes | We experiment on the Twitter Dialogue Corpus (Ritter, Cherry, and Dolan 2011). The dataset is extracted using a procedure similar to Ritter et al. (2011), and is split into training, validation and test sets, containing respectively 749,060, 93,633 and 10,000 dialogues each.1 The Twitter tweet IDs will be made available upon publication. |
| Dataset Splits | Yes | The dataset is extracted using a procedure similar to Ritter et al. (2011), and is split into training, validation and test sets, containing respectively 749,060, 93,633 and 10,000 dialogues each. |
| Hardware Specification | No | The acknowledgments mention "Calcul Qubec (www.calculquebec.ca) and Compute Canada (www.computecanada.ca)" but no specific hardware components like GPU models, CPU models, or memory details used for experiments are provided. |
| Software Dependencies | No | The paper states "We implement all models using Theano (Theano Development Team 2016).", but does not specify a version number for Theano or any other software dependencies like the Adam optimizer. |
| Experiment Setup | Yes | We use word embedding dimensionality of size 400. All models were trained with a learning rate of 0.0001 or 0.0002 and with mini-batches containing 40 or 80 training examples. We use truncated back-propagation and gradient clipping. We also multiply the diagonal covariance matrices of the prior and posterior distributions with 0.1 to make training more stable. We drop words in the decoder with a fixed drop rate of 25% and multiply the KL terms in eq. (4) by a scalar, which starts at zero and linearly increases to 1 over the first 60,000 training batches. |