Celebrating Diversity in Shared Multi-Agent Reinforcement Learning

Authors: Chenghao Li, Tonghan Wang, Chengjie Wu, Qianchuan Zhao, Jun Yang, Chongjie Zhang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results show that our method achieves state-of-the-art performance on Google Research Football and super hard Star Craft II micromanagement tasks. We benchmark our approach on Google Research Football (GRF) [18], and Star Craft II micromanagement tasks (SMAC) [16]. We compare our approach against multi-agent value-based methods (QMIX [5], QPLEX [6]), variational exploration (MAVEN [25]), and individuality emergence (EOI [26]) methods. We carry out ablation studies to test the contribution of its three main components.
Researcher Affiliation Academia Chenghao Li, Tonghan Wang, Chengjie Wu, Qianchuan Zhao, Jun Yang , Chongjie Zhang Tsinghua University {lich18, wangth18, wucj19}@mails.tsinghua.edu.cn, {zhaoqc, yangjun603, chongjie}@tsinghua.edu.cn
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Videos are available at https://sites.google.com/view/celebrate-diversity-shared with codes.
Open Datasets Yes We benchmark our approach on Google Research Football (GRF) [18], and Star Craft II micromanagement tasks (SMAC) [16].
Dataset Splits No The paper discusses training and performance evaluation but does not specify explicit train/validation/test dataset splits (percentages, counts, or predefined splits) for reproducibility.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions various algorithms and frameworks (e.g., QPLEX, QMIX), but does not provide specific version numbers for any software dependencies.
Experiment Setup No The paper mentions hyperparameters like β and λ but does not provide their specific values or other concrete experimental setup details such as learning rates, batch sizes, or optimizer settings in the main text.