Plug-and-Play Policy Planner for Large Language Model Powered Dialogue Agents

Authors: Yang Deng, Wenxuan Zhang, Wai Lam, See-Kiong Ng, Tat-Seng Chua

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that PPDPP consistently and substantially outperforms existing approaches on three different proactive dialogue applications, including negotiation, emotional support, and tutoring dialogues.
Researcher Affiliation Academia Yang Deng1 Wenxuan Zhang Wai Lam2 See-Kiong Ng1 Tat-Seng Chua1 1National University of Singapore 2The Chinese University of Hong Kong {ydeng, seekiong, dcscts}@nus.edu.sg isakzhang@gmail.com wlam@se.cuhk.edu.hk
Pseudocode No The paper describes methods using mathematical formulations (e.g., Equation 7 for policy gradient) but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes 1The code can be accessed via https://github.com/dengyang17/PPDPP.
Open Datasets Yes Craisglist Bargain (He et al., 2018) is created under the bargain negotiation setting... ESConv (Liu et al., 2021) is an emotional support conversation dataset... CIMA (Stasaski et al., 2020) is a crowd-sourced dataset...
Dataset Splits Yes The statistics of adopted datasets are presented in Table 2. In specific, the human-annotated dialogues in the train set are used for the supervised fine-tuning of the dialogue policy planner, while only the case background information in the dataset is adopted for the reinforcement learning process. ...we randomly split the dataset into train/dev/test sets by 8:1:1.
Hardware Specification Yes All the experiments are run on a server equipped with 8 Tesla V100 GPUs.
Software Dependencies No The paper mentions specific LLM models (e.g., 'Chat GPT (gpt-3.5-turbo-0613)', 'Vicuna-13B-delta-v1.1') and RoBERTa, but does not provide specific version numbers for general software dependencies like programming languages or deep learning frameworks (e.g., Python, PyTorch).
Experiment Setup Yes The details of training process are provided in Appendix B. ...Table 6: Batch Size 16 Training Epochs 10 Learning Rate 6e-6 Max Sequence Length 512 Learning Scheduler Linear Weight Decay 0.01 Training Episodes 1,000 Learning Rate 1e-6 Max Conversation Turn 8 Discount Factor γ 0.999 Max New Tokens 32