Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Simulation-Free Hierarchical Latent Policy Planning for Proactive Dialogues

Authors: Tao He, Lizi Liao, Yixin Cao, Yuanxing Liu, Yiheng Sun, Zerui Chen, Ming Liu, Bing Qin

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate that LDPP outperforms existing methods on two proactive scenarios, even surpassing Chat GPT with only a 1.8-billion-parameter LLM. To verify our approach, we conducted experiments widely on Ex TES (Zheng et al. 2023a), ESConv (Liu et al. 2021b) and P4G (Wang et al. 2019b). We compare our method with various baselines, demonstrating its effectiveness. Detailed analysis experiments further support the framework s validity. Extensive experiments across three proactive dialogue benchmarks show our approach outperforms baselines, with analysis confirming its effectiveness.
Researcher Affiliation Academia 1Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology, Harbin, China 2Singapore Management University, Singapore 3School of Computer Science, Fudan University EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the algorithms and framework through text and diagrams (Figure 1), but does not present any formal pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about releasing source code for the described methodology, nor does it provide any links to a code repository.
Open Datasets Yes We evaluate the proposed framework on two typical applications of proactive dialogues, Ex TES (Zheng et al. 2023b) (emotional support) and P4G (Wang et al. 2019a) (persuasion), representing collaborative and non-collaborative dialogue, respectively. Ex TES is an extension of ESConv (Liu et al. 2021b), comprising sufficient dialogues for training (11,117 complete dialogues).
Dataset Splits Yes We randomly divide it into 10,717/200/200 for train/valid/test set. P4G includes 1,017 donation persuasion dialogues where a persuader attempts to persuade a persuadee to donate to a charity called Save the Children. We randomly choose 100/100 dialogues for validation/testing. We take the remaining 817 dialogues as the training set.
Hardware Specification No The paper mentions 'Due to the hardware limitations, we select models under 7B parameters.' but does not provide specific details about the GPU/CPU models, memory, or other hardware specifications used for running the experiments.
Software Dependencies No The paper mentions specific models like 'Ro BERTa-Large (Liu et al. 2019)' and 'Qwen1.5-1.8b (Bai et al. 2023)', and also the critic 'Chat GPT (gpt3.5-turbo-0613 and -0125)', but it does not list any ancillary software dependencies with specific version numbers (e.g., Python, PyTorch, CUDA versions) required to reproduce the environment.
Experiment Setup Yes LDPP is implemented with (T, L, K) = (8, 6, 24). Experiments are conducted on the Ex TES dataset with K= 6, 12, 18, and 24, while keeping other hyper-parameters constant (T = 8, L = 4). We set T as 2, 8, 16, and 24 while keeping (L = 4, K = 24). Lo RA Finetuning (32, 64) means setting lora rank=x and lora alpha=y. Lo RA Finetuning (64, 128). τ 0 is the hyperparamter. And δ is a predefined threshold.