Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Simulation-Free Hierarchical Latent Policy Planning for Proactive Dialogues
Authors: Tao He, Lizi Liao, Yixin Cao, Yuanxing Liu, Yiheng Sun, Zerui Chen, Ming Liu, Bing Qin
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that LDPP outperforms existing methods on two proactive scenarios, even surpassing Chat GPT with only a 1.8-billion-parameter LLM. To verify our approach, we conducted experiments widely on Ex TES (Zheng et al. 2023a), ESConv (Liu et al. 2021b) and P4G (Wang et al. 2019b). We compare our method with various baselines, demonstrating its effectiveness. Detailed analysis experiments further support the framework s validity. Extensive experiments across three proactive dialogue benchmarks show our approach outperforms baselines, with analysis confirming its effectiveness. |
| Researcher Affiliation | Academia | 1Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology, Harbin, China 2Singapore Management University, Singapore 3School of Computer Science, Fudan University EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the algorithms and framework through text and diagrams (Figure 1), but does not present any formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code for the described methodology, nor does it provide any links to a code repository. |
| Open Datasets | Yes | We evaluate the proposed framework on two typical applications of proactive dialogues, Ex TES (Zheng et al. 2023b) (emotional support) and P4G (Wang et al. 2019a) (persuasion), representing collaborative and non-collaborative dialogue, respectively. Ex TES is an extension of ESConv (Liu et al. 2021b), comprising sufficient dialogues for training (11,117 complete dialogues). |
| Dataset Splits | Yes | We randomly divide it into 10,717/200/200 for train/valid/test set. P4G includes 1,017 donation persuasion dialogues where a persuader attempts to persuade a persuadee to donate to a charity called Save the Children. We randomly choose 100/100 dialogues for validation/testing. We take the remaining 817 dialogues as the training set. |
| Hardware Specification | No | The paper mentions 'Due to the hardware limitations, we select models under 7B parameters.' but does not provide specific details about the GPU/CPU models, memory, or other hardware specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions specific models like 'Ro BERTa-Large (Liu et al. 2019)' and 'Qwen1.5-1.8b (Bai et al. 2023)', and also the critic 'Chat GPT (gpt3.5-turbo-0613 and -0125)', but it does not list any ancillary software dependencies with specific version numbers (e.g., Python, PyTorch, CUDA versions) required to reproduce the environment. |
| Experiment Setup | Yes | LDPP is implemented with (T, L, K) = (8, 6, 24). Experiments are conducted on the Ex TES dataset with K= 6, 12, 18, and 24, while keeping other hyper-parameters constant (T = 8, L = 4). We set T as 2, 8, 16, and 24 while keeping (L = 4, K = 24). Lo RA Finetuning (32, 64) means setting lora rank=x and lora alpha=y. Lo RA Finetuning (64, 128). τ 0 is the hyperparamter. And δ is a predefined threshold. |