A Mixture-of-Expert Approach to RL-based Dialogue Management
Authors: Yinlam Chow, Azamat Tulepbergenov, Ofir Nachum, Dhawal Gupta, Moonkyung Ryu, Mohammad Ghavamzadeh, Craig Boutilier
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare it with SOTA baselines on open-domain dialogues and demonstrate its effectiveness both in terms of the diversity and sensibility of the generated utterances and the overall DM performance. We conduct several experiments to test the efficacy of different parts in the Mo E-LM, namely (i) the predictive power and diversity of the primitive, (ii) the quality of experts, and (iii) the overall DM performance. |
| Researcher Affiliation | Industry | Google Research {yinlamchow, atulep, ofirnachum, dhawgupta, mkryu, ghavamza, cboutilier}@google.com |
| Pseudocode | No | The paper describes its models and methods in prose and mathematical equations but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., repository link, explicit statement of release) for the source code of its methodology. |
| Open Datasets | Yes | The first one is the Cornell Movie corpus (Danescu-Niculescu-Mizil and Lee, 2011), which consists of conversations between speakers in different movie lines and has a median conversation length of 3 utterances. The second is the Reddit Casual (Ghandeharioun et al., 2019) conversations, which is a subset of the Reddit corpus that only contains casual conversations on various topics of at least 3 turns and a median of 7 utterances. |
| Dataset Splits | No | The paper mentions using a 'dataset' and 'evaluation set' but does not provide specific train/validation/test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) needed to reproduce the data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions software components like 'RoBERTa-based sentiment detector', 'GPT-2 LM', 'GPT-3', and 'DialoGPT', but does not provide specific version numbers for these or any other ancillary software dependencies. |
| Experiment Setup | Yes | Details of these models can be found in Appendix B.3. For example in the case of a Gaussian Gi, we use the standard REINFORCE (Sutton et al., 1999) algorithm to learn the model parameters (µi, σ2i ) of Gi according to {µi, σi} {µi, σi} + Ez0 Gi( |z),Y ( |z0)[ i(X,Y ) r{µi,σi} log PGi(z0|z)], i 2 {1, . . . , m}, where > 0 is the learning rate. |