reproducibilityindex.ai

On the Effectiveness of Offline RL for Dialogue Response Generation

Authors: Paloma Sodhi, Felix Wu, Ethan R. Elenberg, Kilian Q Weinberger, Ryan Mcdonald

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present a comprehensive evaluation across multiple datasets, models, and metrics.
Researcher Affiliation	Collaboration	1ASAPP, New York, United States 2Cornell University, New York, United States.
Pseudocode	No	The paper describes methods in text and equations but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github.com/ asappresearch/dialogue-offline-rl
Open Datasets	Yes	Multi Woz 2.2 (Zang et al., 2020) is a widely used dataset created to evaluate performance of dialogue systems in multi-domain settings. Action Based Conversations Dataset (ABCD) (Chen et al., 2021a) contains customer-agent conversations... Task Master-3 (Byrne et al., 2019): contains 23,789 conversations between users and a system on movie ticketing.
Dataset Splits	No	The paper mentions using 'validation loss' to pick checkpoints, but does not explicitly provide the train/validation/test splits (e.g., percentages or counts) for the primary datasets (Multi Woz 2.2, ABCD, Task Master-3).
Hardware Specification	Yes	Training is done on an AWS EC2 g5.12xlarge instance which has 4 Nvidia A10G GPUs.
Software Dependencies	No	The paper mentions using 'huggingface transformers library' (Wolf et al., 2019) and 'trlx' for implementation, but does not specify their version numbers.
Experiment Setup	Yes	More hyperparameter details in Tables 7, 8 and 9. These tables specify Model, Batch size, Block size, Max number of epochs, Optimizer, Learning rate, Adam (β1, β2), Adam ϵ, Learning rate scheduler, CQL Scale, τ, γ, PPO value coefficient, and PPO KL initial coefficient.