Dynamic Belief for Decentralized Multi-Agent Cooperative Learning
Authors: Yunpeng Zhai, Peixi Peng, Chen Su, Yonghong Tian
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on the Star Craft II micro management task (SMAC) and demonstrate its superior performance in the decentralized training settings and comparable results with the state-of-the-art CTDE methods. (Abstract) |
| Researcher Affiliation | Academia | 1National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University, Beijing, China 2Peng Cheng Laboratory, Shenzhen, China 3School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen, China |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., specific link, explicit statement of release) for the source code of the methodology described. |
| Open Datasets | Yes | We evaluate our approach on a fully cooperative environment SMAC [Samvelyan et al., 2019], a standardized decentralized Star Craft II micromanagement environment. (Section 5) |
| Dataset Splits | No | The paper does not provide specific training, validation, and test dataset splits (e.g., exact percentages or sample counts) needed for reproduction. |
| Hardware Specification | No | The paper mentions 'The computing resources of Pengcheng Cloudbrain are used in this research' in the Acknowledgements, but it does not specify any exact GPU/CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions software like 'Proximal Policy Optimization (PPO)' and 'QMix' but does not provide specific version numbers for these or any other ancillary software components needed for replication. |
| Experiment Setup | Yes | During training, 8 paralleled episodes are rolled out independently to generate data. The most recent 800 steps before last optimization are used for reference histories in dynamic belief network. And p is set to 0.05 for the adaptive dropout. The loss weights α = 0.5 and β = 0.005. (Section 5, Details) |