Plug-and-Play Policy Planner for Large Language Model Powered Dialogue Agents
Authors: Yang Deng, Wenxuan Zhang, Wai Lam, See-Kiong Ng, Tat-Seng Chua
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that PPDPP consistently and substantially outperforms existing approaches on three different proactive dialogue applications, including negotiation, emotional support, and tutoring dialogues. |
| Researcher Affiliation | Academia | Yang Deng1 Wenxuan Zhang Wai Lam2 See-Kiong Ng1 Tat-Seng Chua1 1National University of Singapore 2The Chinese University of Hong Kong {ydeng, seekiong, dcscts}@nus.edu.sg isakzhang@gmail.com wlam@se.cuhk.edu.hk |
| Pseudocode | No | The paper describes methods using mathematical formulations (e.g., Equation 7 for policy gradient) but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1The code can be accessed via https://github.com/dengyang17/PPDPP. |
| Open Datasets | Yes | Craisglist Bargain (He et al., 2018) is created under the bargain negotiation setting... ESConv (Liu et al., 2021) is an emotional support conversation dataset... CIMA (Stasaski et al., 2020) is a crowd-sourced dataset... |
| Dataset Splits | Yes | The statistics of adopted datasets are presented in Table 2. In specific, the human-annotated dialogues in the train set are used for the supervised fine-tuning of the dialogue policy planner, while only the case background information in the dataset is adopted for the reinforcement learning process. ...we randomly split the dataset into train/dev/test sets by 8:1:1. |
| Hardware Specification | Yes | All the experiments are run on a server equipped with 8 Tesla V100 GPUs. |
| Software Dependencies | No | The paper mentions specific LLM models (e.g., 'Chat GPT (gpt-3.5-turbo-0613)', 'Vicuna-13B-delta-v1.1') and RoBERTa, but does not provide specific version numbers for general software dependencies like programming languages or deep learning frameworks (e.g., Python, PyTorch). |
| Experiment Setup | Yes | The details of training process are provided in Appendix B. ...Table 6: Batch Size 16 Training Epochs 10 Learning Rate 6e-6 Max Sequence Length 512 Learning Scheduler Linear Weight Decay 0.01 Training Episodes 1,000 Learning Rate 1e-6 Max Conversation Turn 8 Discount Factor γ 0.999 Max New Tokens 32 |