reproducibilityindex.ai

Hierarchical Reinforcement Learning for Open-Domain Dialog

Authors: Abdelrhman Saleh, Natasha Jaques, Asma Ghandeharioun, Judy Shen, Rosalind Picard8741-8748

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our evaluation reveals that VHRL improves human judgments of conversational quality above state-of-the-art dialog architectures, including Transformer-based models. We use both interactive human evaluation and automatic metrics.
Researcher Affiliation	Academia	1Harvard University, 2MIT Media Lab abdelrhman saleh@college.harvard.edu, {jaquesn, asma gh, judyshen}@mit.edu, picard@media.mit.edu
Pseudocode	No	No pseudocode or algorithm block was found in the paper.
Open Source Code	Yes	In addition, we release code for our evaluation platform and our models at https://github.com/natashamjaques/ neural chat.
Open Datasets	Yes	All of our models are trained on a corpus of 109K conversations scraped from www.reddit.com/r/ Casual Conversations, which was shown to result in higher conversation quality than traditional datasets such as Cornell movie dialogs (Ghandeharioun et al. 2019).
Dataset Splits	No	The paper mentions training on a dataset and refers to a 'test set', but does not explicitly provide details about train, validation, and test splits (e.g., percentages or specific counts for each).
Hardware Specification	No	The paper does not specify the hardware (e.g., CPU, GPU models, or cloud instance types) used for running experiments.
Software Dependencies	No	The paper mentions 'Parl AI (Miller et al. 2017)' as the basis for their Transformer implementation, but does not provide specific version numbers for Parl AI or any other software dependencies.
Experiment Setup	Yes	We limit each model to 3 turns for a total conversation length of 7 utterances. We initialize conversations with randomly sampled starting sentences from the training set and let our model interact with a user simulator which is a ﬁxed copy of itself. Additional training details are given in the extended version of this paper (Saleh et al. 2019b).