reproducibilityindex.ai

Plug-and-Play Policy Planner for Large Language Model Powered Dialogue Agents

Authors: Yang Deng, Wenxuan Zhang, Wai Lam, See-Kiong Ng, Tat-Seng Chua

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that PPDPP consistently and substantially outperforms existing approaches on three different proactive dialogue applications, including negotiation, emotional support, and tutoring dialogues.
Researcher Affiliation	Academia	Yang Deng1 Wenxuan Zhang Wai Lam2 See-Kiong Ng1 Tat-Seng Chua1 1National University of Singapore 2The Chinese University of Hong Kong {ydeng, seekiong, dcscts}@nus.edu.sg isakzhang@gmail.com wlam@se.cuhk.edu.hk
Pseudocode	No	The paper describes methods using mathematical formulations (e.g., Equation 7 for policy gradient) but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	1The code can be accessed via https://github.com/dengyang17/PPDPP.
Open Datasets	Yes	Craisglist Bargain (He et al., 2018) is created under the bargain negotiation setting... ESConv (Liu et al., 2021) is an emotional support conversation dataset... CIMA (Stasaski et al., 2020) is a crowd-sourced dataset...
Dataset Splits	Yes	The statistics of adopted datasets are presented in Table 2. In specific, the human-annotated dialogues in the train set are used for the supervised fine-tuning of the dialogue policy planner, while only the case background information in the dataset is adopted for the reinforcement learning process. ...we randomly split the dataset into train/dev/test sets by 8:1:1.
Hardware Specification	Yes	All the experiments are run on a server equipped with 8 Tesla V100 GPUs.
Software Dependencies	No	The paper mentions specific LLM models (e.g., 'Chat GPT (gpt-3.5-turbo-0613)', 'Vicuna-13B-delta-v1.1') and RoBERTa, but does not provide specific version numbers for general software dependencies like programming languages or deep learning frameworks (e.g., Python, PyTorch).
Experiment Setup	Yes	The details of training process are provided in Appendix B. ...Table 6: Batch Size 16 Training Epochs 10 Learning Rate 6e-6 Max Sequence Length 512 Learning Scheduler Linear Weight Decay 0.01 Training Episodes 1,000 Learning Rate 1e-6 Max Conversation Turn 8 Discount Factor γ 0.999 Max New Tokens 32