Hierarchical Reinforcement Learning for Open-Domain Dialog

Authors: Abdelrhman Saleh, Natasha Jaques, Asma Ghandeharioun, Judy Shen, Rosalind Picard8741-8748

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our evaluation reveals that VHRL improves human judgments of conversational quality above state-of-the-art dialog architectures, including Transformer-based models. We use both interactive human evaluation and automatic metrics.
Researcher Affiliation Academia 1Harvard University, 2MIT Media Lab abdelrhman saleh@college.harvard.edu, {jaquesn, asma gh, judyshen}@mit.edu, picard@media.mit.edu
Pseudocode No No pseudocode or algorithm block was found in the paper.
Open Source Code Yes In addition, we release code for our evaluation platform and our models at https://github.com/natashamjaques/ neural chat.
Open Datasets Yes All of our models are trained on a corpus of 109K conversations scraped from www.reddit.com/r/ Casual Conversations, which was shown to result in higher conversation quality than traditional datasets such as Cornell movie dialogs (Ghandeharioun et al. 2019).
Dataset Splits No The paper mentions training on a dataset and refers to a 'test set', but does not explicitly provide details about train, validation, and test splits (e.g., percentages or specific counts for each).
Hardware Specification No The paper does not specify the hardware (e.g., CPU, GPU models, or cloud instance types) used for running experiments.
Software Dependencies No The paper mentions 'Parl AI (Miller et al. 2017)' as the basis for their Transformer implementation, but does not provide specific version numbers for Parl AI or any other software dependencies.
Experiment Setup Yes We limit each model to 3 turns for a total conversation length of 7 utterances. We initialize conversations with randomly sampled starting sentences from the training set and let our model interact with a user simulator which is a fixed copy of itself. Additional training details are given in the extended version of this paper (Saleh et al. 2019b).