reproducibilityindex.ai

Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management

Authors: Dhawal Gupta, Yinlam Chow, Azamat Tulepbergenov, Mohammad Ghavamzadeh, Craig Boutilier

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our methods in open-domain dialogue to demonstrate their effectiveness with respect to the diversity of intent in generated utterances and overall DM performance. ... Finally, in Section 6, we evaluate our algorithms in open-domain dialogues against their ability to generate utterances with diverse intents and their overall DM performance.
Researcher Affiliation	Collaboration	Dhawal Gupta University of Massachusetts dgupta@cs.umass.edu Yinlam Chow Google Research yinlamchow@google.com Aza Tulepbergenov Google Research atulep@google.com Mohammad Ghavamzadeh Amazon ghavamza@amazon.com Craig Boutilier Google Research cboutilier@google.com
Pseudocode	No	The paper does not contain explicit pseudocode or algorithm blocks, though it describes algorithms and provides mathematical formulations.
Open Source Code	No	The paper does not provide an explicit statement or link to open-source code for the described methodology.
Open Datasets	Yes	We obtained these datasets from the Neural Chat datasets of the MIT Media Lab, which is available at the following link: https://affect.media.mit.edu/neural_chat/datasets.
Dataset Splits	No	The paper does not explicitly provide specific percentages or counts for training, validation, and test splits. It mentions using datasets for evaluation but does not detail the splits.
Hardware Specification	Yes	Training and evaluation were run on 8 GPU instances with 32GB of RAM and a NVIDIA Tesla P100 graphics card.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers.
Experiment Setup	Yes	Table 8 summarizes the hyper-parameters that were used for training the Q,V functions. ... Hyper Parameter Value Number of layers (Q, V ) 3 Activation Re LU Hidden Size 512 Epochs 100 Max Unroll 30 Batch Size 256 Learning Rate 2 10 3 Optimizer Adam τ (IQL) 0.9 Dropout (Ens Q, KLC) 0.5