reproducibilityindex.ai

Cooper: Coordinating Specialized Agents towards a Complex Dialogue Goal

Authors: Yi Cheng, Wenge Liu, Jian Wang, Chak Tou Leong, Yi Ouyang, Wenjie Li, Xian Wu, Yefeng Zheng

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on persuasion and emotional support dialogues demonstrate the superiority of our method over a set of competitive baselines.
Researcher Affiliation	Collaboration	1The Hong Kong Polytechnic University 2Baidu Inc., Beijing, China 3Jarvis Research Center, Tencent You Tu Lab
Pseudocode	No	The paper describes the framework components and their realization through prose and mathematical formulas, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our codes are available at https://github.com/Yi Cheng98/Cooper.
Open Datasets	Yes	Our experiments are conducted on the ESConv dataset (Liu et al. 2021b) and the P4G dataset (Wang et al. 2019).
Dataset Splits	Yes	After preprocessing, there are 1,040/130/130 conversations in the training/validation/test sets, with an average of 11.7 rounds of interactions in each dialogue. P4G is a persuasion dialogue dataset, including 1,017 dialogues with an average of 10.4 dialogue rounds. We distribute 867/50/100 conversations into the training/validation/test sets.
Hardware Specification	No	The paper mentions that prompt-based modules are implemented with 'gpt-3.5-turbo', but it does not specify any particular hardware (e.g., GPU/CPU models, memory, or server specifications) used for running the experiments or training models.
Software Dependencies	No	The paper mentions using 'gpt-3.5-turbo' and that the finetuned approach is developed upon 'BART', but it does not specify any software dependencies with version numbers (e.g., programming language versions, library versions, or specific framework versions) for the implementation of their method.
Experiment Setup	Yes	We set m=4 on the ESConv dataset (i.e., each agent needs to produce four topic candidates) and m=3 on the P4G dataset. We set K=3 on both datasets (i.e., the top-3 topic candidates are used to guide utterance generation). In the global coordination module, we set α=0.9 and τ=0.2.