Hierarchical Reinforcement Learning for Open-Domain Dialog
Authors: Abdelrhman Saleh, Natasha Jaques, Asma Ghandeharioun, Judy Shen, Rosalind Picard8741-8748
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our evaluation reveals that VHRL improves human judgments of conversational quality above state-of-the-art dialog architectures, including Transformer-based models. We use both interactive human evaluation and automatic metrics. |
| Researcher Affiliation | Academia | 1Harvard University, 2MIT Media Lab abdelrhman saleh@college.harvard.edu, {jaquesn, asma gh, judyshen}@mit.edu, picard@media.mit.edu |
| Pseudocode | No | No pseudocode or algorithm block was found in the paper. |
| Open Source Code | Yes | In addition, we release code for our evaluation platform and our models at https://github.com/natashamjaques/ neural chat. |
| Open Datasets | Yes | All of our models are trained on a corpus of 109K conversations scraped from www.reddit.com/r/ Casual Conversations, which was shown to result in higher conversation quality than traditional datasets such as Cornell movie dialogs (Ghandeharioun et al. 2019). |
| Dataset Splits | No | The paper mentions training on a dataset and refers to a 'test set', but does not explicitly provide details about train, validation, and test splits (e.g., percentages or specific counts for each). |
| Hardware Specification | No | The paper does not specify the hardware (e.g., CPU, GPU models, or cloud instance types) used for running experiments. |
| Software Dependencies | No | The paper mentions 'Parl AI (Miller et al. 2017)' as the basis for their Transformer implementation, but does not provide specific version numbers for Parl AI or any other software dependencies. |
| Experiment Setup | Yes | We limit each model to 3 turns for a total conversation length of 7 utterances. We initialize conversations with randomly sampled starting sentences from the training set and let our model interact with a user simulator which is a fixed copy of itself. Additional training details are given in the extended version of this paper (Saleh et al. 2019b). |