Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent RL
Authors: Siyi Hu, Chuanlong Xie, Xiaodan Liang, Xiaojun Chang
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments clearly show that role diversity can serve as a robust measurement for the characteristics of a multi-agent cooperation task and help diagnose whether the policy fits the current multi-agent system for a better policy performance. The main experiments are conducted on MPE [29] and SMAC [40] benchmarks. |
| Researcher Affiliation | Collaboration | 1Monash University 2The Re LER Lab, University of Technology Sydney 3Beijing Normal University 4Huawei Noah s Ark Lab 5Sun Yat-sen University. |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not include any explicit statement about providing open-source code for the methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | The main experimental platforms are based on Multiagent Particle Environment (MPE) [29] and The Star Craft Multi-Agent Challenge (SMAC) [40]. |
| Dataset Splits | No | The paper mentions 'training steps' and 'full training steps' but does not specify explicit training, validation, or test dataset splits in terms of percentages, sample counts, or defined subsets for reproducibility. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run its experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions various algorithms and benchmarks used (e.g., MPE, SMAC, VDN, QMIX) but does not provide specific version numbers for any of these software components or underlying libraries, which is necessary for reproducible ancillary software description. |
| Experiment Setup | Yes | All results come from eight random seeds. For policy gradient-based methods, we extend the training steps from the standard 2M to 20M (10 times) as the convergence speed of policy gradient-based methods (e.g. MAPPO, MAA2C) is slower than Q value-based methods. |