Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management

Authors: Dhawal Gupta, Yinlam Chow, Azamat Tulepbergenov, Mohammad Ghavamzadeh, Craig Boutilier

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our methods in open-domain dialogue to demonstrate their effectiveness with respect to the diversity of intent in generated utterances and overall DM performance. ... Finally, in Section 6, we evaluate our algorithms in open-domain dialogues against their ability to generate utterances with diverse intents and their overall DM performance.
Researcher Affiliation Collaboration Dhawal Gupta University of Massachusetts dgupta@cs.umass.edu Yinlam Chow Google Research yinlamchow@google.com Aza Tulepbergenov Google Research atulep@google.com Mohammad Ghavamzadeh Amazon ghavamza@amazon.com Craig Boutilier Google Research cboutilier@google.com
Pseudocode No The paper does not contain explicit pseudocode or algorithm blocks, though it describes algorithms and provides mathematical formulations.
Open Source Code No The paper does not provide an explicit statement or link to open-source code for the described methodology.
Open Datasets Yes We obtained these datasets from the Neural Chat datasets of the MIT Media Lab, which is available at the following link: https://affect.media.mit.edu/neural_chat/datasets.
Dataset Splits No The paper does not explicitly provide specific percentages or counts for training, validation, and test splits. It mentions using datasets for evaluation but does not detail the splits.
Hardware Specification Yes Training and evaluation were run on 8 GPU instances with 32GB of RAM and a NVIDIA Tesla P100 graphics card.
Software Dependencies No The paper does not list specific software dependencies with version numbers.
Experiment Setup Yes Table 8 summarizes the hyper-parameters that were used for training the Q,V functions. ... Hyper Parameter Value Number of layers (Q, V ) 3 Activation Re LU Hidden Size 512 Epochs 100 Max Unroll 30 Batch Size 256 Learning Rate 2 10 3 Optimizer Adam τ (IQL) 0.9 Dropout (Ens Q, KLC) 0.5