reproducibilityindex.ai

A Mixture-of-Expert Approach to RL-based Dialogue Management

Authors: Yinlam Chow, Azamat Tulepbergenov, Ofir Nachum, Dhawal Gupta, Moonkyung Ryu, Mohammad Ghavamzadeh, Craig Boutilier

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare it with SOTA baselines on open-domain dialogues and demonstrate its effectiveness both in terms of the diversity and sensibility of the generated utterances and the overall DM performance. We conduct several experiments to test the efﬁcacy of different parts in the Mo E-LM, namely (i) the predictive power and diversity of the primitive, (ii) the quality of experts, and (iii) the overall DM performance.
Researcher Affiliation	Industry	Google Research {yinlamchow, atulep, ofirnachum, dhawgupta, mkryu, ghavamza, cboutilier}@google.com
Pseudocode	No	The paper describes its models and methods in prose and mathematical equations but does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access information (e.g., repository link, explicit statement of release) for the source code of its methodology.
Open Datasets	Yes	The ﬁrst one is the Cornell Movie corpus (Danescu-Niculescu-Mizil and Lee, 2011), which consists of conversations between speakers in different movie lines and has a median conversation length of 3 utterances. The second is the Reddit Casual (Ghandeharioun et al., 2019) conversations, which is a subset of the Reddit corpus that only contains casual conversations on various topics of at least 3 turns and a median of 7 utterances.
Dataset Splits	No	The paper mentions using a 'dataset' and 'evaluation set' but does not provide specific train/validation/test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) needed to reproduce the data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions software components like 'RoBERTa-based sentiment detector', 'GPT-2 LM', 'GPT-3', and 'DialoGPT', but does not provide specific version numbers for these or any other ancillary software dependencies.
Experiment Setup	Yes	Details of these models can be found in Appendix B.3. For example in the case of a Gaussian Gi, we use the standard REINFORCE (Sutton et al., 1999) algorithm to learn the model parameters (µi, σ2i ) of Gi according to {µi, σi} {µi, σi} + Ez0 Gi( \|z),Y ( \|z0)[ i(X,Y ) r{µi,σi} log PGi(z0\|z)], i 2 {1, . . . , m}, where > 0 is the learning rate.