Cooper: Coordinating Specialized Agents towards a Complex Dialogue Goal

Authors: Yi Cheng, Wenge Liu, Jian Wang, Chak Tou Leong, Yi Ouyang, Wenjie Li, Xian Wu, Yefeng Zheng

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on persuasion and emotional support dialogues demonstrate the superiority of our method over a set of competitive baselines.
Researcher Affiliation Collaboration 1The Hong Kong Polytechnic University 2Baidu Inc., Beijing, China 3Jarvis Research Center, Tencent You Tu Lab
Pseudocode No The paper describes the framework components and their realization through prose and mathematical formulas, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our codes are available at https://github.com/Yi Cheng98/Cooper.
Open Datasets Yes Our experiments are conducted on the ESConv dataset (Liu et al. 2021b) and the P4G dataset (Wang et al. 2019).
Dataset Splits Yes After preprocessing, there are 1,040/130/130 conversations in the training/validation/test sets, with an average of 11.7 rounds of interactions in each dialogue. P4G is a persuasion dialogue dataset, including 1,017 dialogues with an average of 10.4 dialogue rounds. We distribute 867/50/100 conversations into the training/validation/test sets.
Hardware Specification No The paper mentions that prompt-based modules are implemented with 'gpt-3.5-turbo', but it does not specify any particular hardware (e.g., GPU/CPU models, memory, or server specifications) used for running the experiments or training models.
Software Dependencies No The paper mentions using 'gpt-3.5-turbo' and that the finetuned approach is developed upon 'BART', but it does not specify any software dependencies with version numbers (e.g., programming language versions, library versions, or specific framework versions) for the implementation of their method.
Experiment Setup Yes We set m=4 on the ESConv dataset (i.e., each agent needs to produce four topic candidates) and m=3 on the P4G dataset. We set K=3 on both datasets (i.e., the top-3 topic candidates are used to guide utterance generation). In the global coordination module, we set α=0.9 and τ=0.2.