reproducibilityindex.ai

Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent RL

Authors: Siyi Hu, Chuanlong Xie, Xiaodan Liang, Xiaojun Chang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments clearly show that role diversity can serve as a robust measurement for the characteristics of a multi-agent cooperation task and help diagnose whether the policy fits the current multi-agent system for a better policy performance. The main experiments are conducted on MPE [29] and SMAC [40] benchmarks.
Researcher Affiliation	Collaboration	1Monash University 2The Re LER Lab, University of Technology Sydney 3Beijing Normal University 4Huawei Noah s Ark Lab 5Sun Yat-sen University.
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not include any explicit statement about providing open-source code for the methodology, nor does it provide a link to a code repository.
Open Datasets	Yes	The main experimental platforms are based on Multiagent Particle Environment (MPE) [29] and The Star Craft Multi-Agent Challenge (SMAC) [40].
Dataset Splits	No	The paper mentions 'training steps' and 'full training steps' but does not specify explicit training, validation, or test dataset splits in terms of percentages, sample counts, or defined subsets for reproducibility.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run its experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions various algorithms and benchmarks used (e.g., MPE, SMAC, VDN, QMIX) but does not provide specific version numbers for any of these software components or underlying libraries, which is necessary for reproducible ancillary software description.
Experiment Setup	Yes	All results come from eight random seeds. For policy gradient-based methods, we extend the training steps from the standard 2M to 20M (10 times) as the convergence speed of policy gradient-based methods (e.g. MAPPO, MAA2C) is slower than Q value-based methods.