Dynamic Belief for Decentralized Multi-Agent Cooperative Learning

Authors: Yunpeng Zhai, Peixi Peng, Chen Su, Yonghong Tian

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on the Star Craft II micro management task (SMAC) and demonstrate its superior performance in the decentralized training settings and comparable results with the state-of-the-art CTDE methods. (Abstract)
Researcher Affiliation Academia 1National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University, Beijing, China 2Peng Cheng Laboratory, Shenzhen, China 3School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen, China
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access information (e.g., specific link, explicit statement of release) for the source code of the methodology described.
Open Datasets Yes We evaluate our approach on a fully cooperative environment SMAC [Samvelyan et al., 2019], a standardized decentralized Star Craft II micromanagement environment. (Section 5)
Dataset Splits No The paper does not provide specific training, validation, and test dataset splits (e.g., exact percentages or sample counts) needed for reproduction.
Hardware Specification No The paper mentions 'The computing resources of Pengcheng Cloudbrain are used in this research' in the Acknowledgements, but it does not specify any exact GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No The paper mentions software like 'Proximal Policy Optimization (PPO)' and 'QMix' but does not provide specific version numbers for these or any other ancillary software components needed for replication.
Experiment Setup Yes During training, 8 paralleled episodes are rolled out independently to generate data. The most recent 800 steps before last optimization are used for reference histories in dynamic belief network. And p is set to 0.05 for the adaptive dropout. The loss weights α = 0.5 and β = 0.005. (Section 5, Details)