Consensus Learning for Cooperative Multi-Agent Reinforcement Learning
Authors: Zhiwei Xu, Bin Zhang, Dapeng Li, Zeren Zhang, Guangchong Zhou, Hao Chen, Guoliang Fan
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate COLA on variants of the multi-agent particle environments (Lowe et al. 2017), the challenging micromanagement task of Star Craft II (Samvelyan et al. 2019), and the mini-scenarios of Google Research Football (Kurach et al. 2020). We demonstrate that COLA outperforms the previous baselines through experimental results. |
| Researcher Affiliation | Academia | Institute of Automation, Chinese Academy of Sciences School of Artificial Intelligence, University of Chinese Academy of Sciences Beijing, China {xuzhiwei2019, zhangbin2020, lidapeng2020, zhangzeren2021, zhouguangchong2021, chenhao2019, guoliang.fan}@ia.ac.cn |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not include any statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | We evaluate COLA in three challenging environments: the multi-agent particle environments (MPE), the Star Craft multi-agent challenge (SMAC), and Google Research Football (GRF). The detailed descriptions for the three environments can be found in Appendix A. |
| Dataset Splits | No | The paper mentions running experiments with '5 random seeds' but does not provide explicit details about training, validation, or test dataset splits (e.g., percentages, sample counts, or references to predefined standard splits for reproducibility beyond just naming the benchmark datasets). |
| Hardware Specification | No | The paper does not provide any specific hardware details such as CPU, GPU models, or memory used for running the experiments. |
| Software Dependencies | No | The paper does not list any specific software dependencies with version numbers. |
| Experiment Setup | Yes | All hyperparameter settings and the details of the implementation can be found in Appendix B. Since the scenarios in MPE is simple, we set the number of classes of consensus K = 4. For the easy scenarios, however, we still keep K = 4. Furthermore, the number of surviving agents always changes in one episode. Local observations of dead agents are padded with zeros, which violates the viewpoint invariance principle. Therefore, we disregard local observations of dead agents when training the consensus builder. |