Consensus Learning for Cooperative Multi-Agent Reinforcement Learning

Authors: Zhiwei Xu, Bin Zhang, Dapeng Li, Zeren Zhang, Guangchong Zhou, Hao Chen, Guoliang Fan

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate COLA on variants of the multi-agent particle environments (Lowe et al. 2017), the challenging micromanagement task of Star Craft II (Samvelyan et al. 2019), and the mini-scenarios of Google Research Football (Kurach et al. 2020). We demonstrate that COLA outperforms the previous baselines through experimental results.
Researcher Affiliation Academia Institute of Automation, Chinese Academy of Sciences School of Artificial Intelligence, University of Chinese Academy of Sciences Beijing, China {xuzhiwei2019, zhangbin2020, lidapeng2020, zhangzeren2021, zhouguangchong2021, chenhao2019, guoliang.fan}@ia.ac.cn
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not include any statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets Yes We evaluate COLA in three challenging environments: the multi-agent particle environments (MPE), the Star Craft multi-agent challenge (SMAC), and Google Research Football (GRF). The detailed descriptions for the three environments can be found in Appendix A.
Dataset Splits No The paper mentions running experiments with '5 random seeds' but does not provide explicit details about training, validation, or test dataset splits (e.g., percentages, sample counts, or references to predefined standard splits for reproducibility beyond just naming the benchmark datasets).
Hardware Specification No The paper does not provide any specific hardware details such as CPU, GPU models, or memory used for running the experiments.
Software Dependencies No The paper does not list any specific software dependencies with version numbers.
Experiment Setup Yes All hyperparameter settings and the details of the implementation can be found in Appendix B. Since the scenarios in MPE is simple, we set the number of classes of consensus K = 4. For the easy scenarios, however, we still keep K = 4. Furthermore, the number of surviving agents always changes in one episode. Local observations of dead agents are padded with zeros, which violates the viewpoint invariance principle. Therefore, we disregard local observations of dead agents when training the consensus builder.