Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Enhancing User-Oriented Proactivity in Open-Domain Dialogues with Critic Guidance

Authors: Yufeng Wang, Jinwu Hu, Ziteng Huang, Kunyang Lin, Zitian Zhang, Peihao Chen, Yu Hu, Qianyue Wang, Zhuliang Yu, Bin Sun, Xiaofen Xing, Qingfang Zheng, Mingkui Tan

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that our proposed training method is applicable to different LLMs, improving user-oriented proactivity and attractiveness in open-domain dialogues. Code and appendix are available at github.com/wang678/LLM-UPC. The paper contains a dedicated '5 Experiment' section, presenting comparative results, ablation studies, and real-user evaluations with performance metrics (Tables 1, 2, and Figures 4, 5).
Researcher Affiliation Collaboration The authors are affiliated with: 1South China University of Technology (Academia), 2Peng Cheng Laboratory (Public Research), 3Pazhou Laboratory (Public Research), 4Tencent AI Lab (Industry), 5Tencent Robotics X Lab (Industry), 6Hong Kong Polytechnic University (Academia), 7Hunan University (Academia). The mix of university and industry affiliations indicates a collaboration.
Pseudocode Yes The paper includes 'Algorithm 1 Dialogue corpus generation in iteration k.' and 'Algorithm 2 Iterative Curriculum Learning.' which are clearly labeled algorithm blocks.
Open Source Code Yes Code and appendix are available at github.com/wang678/LLM-UPC.
Open Datasets No The paper states: 'Finally, we construct the ISCO-800, a dataset with 800 user backgrounds, to create diverse user agents.' and '3) Construction of a user background dataset ISCO-800.' While a new dataset is constructed and described, no direct URL, DOI, or specific repository name for accessing the ISCO-800 dataset itself is provided, separate from the general code repository link.
Dataset Splits Yes The 800 user agents are divided into training, validation, and test sets (500, 100, and 200 users, respectively) for dialogue generation.
Hardware Specification No No specific hardware details (like GPU/CPU models, processors, or memory) used for running experiments are mentioned in the paper. It only references large language models used.
Software Dependencies No The paper mentions using specific LLM models (e.g., Qwen1.5-32B-Chat, GPT-3.5, GPT-4) but does not provide specific software dependencies or library versions (e.g., Python, PyTorch, TensorFlow, CUDA versions) needed to replicate the experimental setup.
Experiment Setup Yes The paper provides specific hyperparameters in Appendix B: 'In our experiment, 𝛼= 3, 𝛽= 2, 𝑅= 3, 𝑇= 5, and 𝐾= 4.' These parameters (alpha, beta, maximum regeneration attempts R, dialogue turns T, and maximum number of iterations K) define the experimental setup.