reproducibilityindex.ai

Dynamic Belief for Decentralized Multi-Agent Cooperative Learning

Authors: Yunpeng Zhai, Peixi Peng, Chen Su, Yonghong Tian

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method on the Star Craft II micro management task (SMAC) and demonstrate its superior performance in the decentralized training settings and comparable results with the state-of-the-art CTDE methods. (Abstract)
Researcher Affiliation	Academia	1National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University, Beijing, China 2Peng Cheng Laboratory, Shenzhen, China 3School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen, China
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access information (e.g., specific link, explicit statement of release) for the source code of the methodology described.
Open Datasets	Yes	We evaluate our approach on a fully cooperative environment SMAC [Samvelyan et al., 2019], a standardized decentralized Star Craft II micromanagement environment. (Section 5)
Dataset Splits	No	The paper does not provide specific training, validation, and test dataset splits (e.g., exact percentages or sample counts) needed for reproduction.
Hardware Specification	No	The paper mentions 'The computing resources of Pengcheng Cloudbrain are used in this research' in the Acknowledgements, but it does not specify any exact GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies	No	The paper mentions software like 'Proximal Policy Optimization (PPO)' and 'QMix' but does not provide specific version numbers for these or any other ancillary software components needed for replication.
Experiment Setup	Yes	During training, 8 paralleled episodes are rolled out independently to generate data. The most recent 800 steps before last optimization are used for reference histories in dynamic belief network. And p is set to 0.05 for the adaptive dropout. The loss weights α = 0.5 and β = 0.005. (Section 5, Details)