SPD: Synergy Pattern Diversifying Oriented Unsupervised Multi-agent Reinforcement Learning

Authors: Yuhang Jiang, Jianzhun Shao, Shuncheng He, Hongchang Zhang, Xiangyang Ji

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we show the capacity of SPD to acquire meaningful coordination policies, such as maintaining specific formations in Multi-Agent Particle Environment and passand-shoot in Google Research Football. Furthermore, we demonstrate that the same instructive pretrained policy s parameters can serve as a good initialization for a series of downstream tasks policies, achieving higher data efficiency and outperforming state-of-the-art approaches in Google Research Football.
Researcher Affiliation Academia Yuhang Jiang , Jianzhun Shao , Shuncheng He, Hongchang Zhang, Xiangyang Ji Department of Automation Tsinghua University, Beijing, China
Pseudocode Yes Algorithm 1 SPD
Open Source Code Yes Our code is available at https://github.com/thu-rllab/SPD.
Open Datasets Yes We first train SPD on the complicated MARL environment: Google Research Football [18] without environment reward. ... we first evaluate the diversity of coordination policies learned by SPD and URL baselines in Multi-agent Particle Environment2 [22, 45].
Dataset Splits No The paper describes the scenarios and environments used but does not specify explicit train/validation/test splits or their proportions.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies No The paper mentions software components like QMIX, Sinkhorn-Knopp algorithm, and Kuhn Munkres algorithm, but it does not specify any version numbers for these or other software dependencies.
Experiment Setup Yes The hyper-parameters are kept to be the same, and please refer to Appendix B.2 for details.