Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

CADP: Towards Better Centralized Learning for Decentralized Execution in MARL

Authors: Yihe Zhou, Shunyu Liu, Yunpeng Qing, Tongya Zheng, Kaixuan Chen, Jie Song, Mingli Song

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluations on different benchmarks and across various MARL backbones demonstrate that the proposed framework achieves superior performance compared with the state-of-the-art counterparts. Our code is available at https://github.com/zyh1999/CADP To demonstrate the effectiveness of the proposed CADP framework, we conduct experiments on the Star Craft II micromanagement challenge and Google Research Football benchmark.
Researcher Affiliation	Academia	1Zhejiang University 2Zhejiang Provincial Engineering Research Center for Real-Time Smart Tech in Urban Security Governance, School of Computer and Computing Science, Hangzhou City University 3State Key Laboratory of Blockchain and Data Security, Zhejiang University 4Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security EMAIL, doujiang EMAIL
Pseudocode	Yes	In addition, we provide pseudocode in Appendix D.
Open Source Code	Yes	Our code is available at https://github.com/zyh1999/CADP
Open Datasets	Yes	To demonstrate the effectiveness of the proposed CADP framework, we conduct experiments on the Star Craft II micromanagement challenge and Google Research Football benchmark.
Dataset Splits	No	The paper discusses various scenarios from the Star Craft II micromanagement challenge and Google Research Football benchmark (e.g., "3s5z vs 3s6z", "corridor", "3 vs 1 with keeper scenario"), but it does not provide specific details on how the dataset was split into training, validation, or test sets in terms of percentages or sample counts. It refers to "learning curves" and evaluation, but the split methodology is not explicitly stated.
Hardware Specification	No	The paper mentions "advanced computing resources provided by the Supercomputing Center of Hangzhou City University" in the acknowledgments, but it does not specify any particular hardware components such as GPU models, CPU types, or memory amounts used for the experiments.
Software Dependencies	No	The paper mentions the use of various MARL methods and frameworks like QMIX, VDN, QPLEX, and MAPPO, but it does not provide specific version numbers for these or any underlying software libraries (e.g., Python, PyTorch, TensorFlow, CUDA) that would be needed for reproducibility.
Experiment Setup	Yes	The detailed hyperparameters are given in Appendix B. We examine the effect of the coefficient α in 3s5z vs 3s6z scenarios in Figure 5. In GRF benchmark, we set T = 3M in 3 vs 1 with keeper scenario and T = 6M in counterattack easy scenario respectively.