Heterogeneous Skill Learning for Multi-agent Tasks

Authors: Yuntao Liu, Yuan Li, Xinhai Xu, Yong Dou, Donghong Liu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we test the performance of our framework on three challenging multi-agent tasks, i.e., Star Craft II micromanagement multi-agent challenge (SMAC) [22], Google Research Football (GRF) [15] and Go Bigger [8]. We compare our approach (HSL) with classical multi-agent value decomposition methods, i.e., QMIX [21] and QPLEX [26], role-based methods , i.e., ROMA[27],RODE[28], diversity-based method CDS [4] and skill-based method HSD [32]. All experiments are conducted over five random seeds. The detailed setting of the experimental setup is described in Appendix B.2.
Researcher Affiliation Academia Yuntao Liu Academy of Military Science Beijing, China liu-yt@foxmail.com Yuan Li Academy of Military Science Beijing, China liyuan@nudt.edu.cn Xinhai Xu Academy of Military Science Beijing, China xuxinhai@nudt.edu.cn Yong Dou National University of Defense Technology Hunan Changsha, China douyong@nudt.edu.cn Donghong Liu Academy of Military Science Beijing, China liu_donghong@sina.com
Pseudocode No The paper does not include a section or figure explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code No Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] The code is proprietary.
Open Datasets Yes Star Craft II micromanagement multi-agent challenge (SMAC) [22], Google Research Football (GRF) [15] and Go Bigger [8].
Dataset Splits No The paper specifies environments/scenarios for training and evaluation but does not define traditional dataset splits (e.g., percentages or sample counts) for training, validation, and testing as commonly done for static datasets. It refers to training steps and evaluation episodes within these dynamic environments.
Hardware Specification Yes All experiments are conducted on 8 NVIDIA GeForce RTX 3090 GPUs.
Software Dependencies No The paper mentions using 'Pytorch' in Appendix B.2 but does not specify its version number or any other software dependencies with their respective versions.
Experiment Setup Yes The hyperparameters are set as follows: learning rate 5e-4 for all agents, discount factor 0.99, exploration epsilon decay from 1.0 to 0.05 over 50k steps. The batch size is 32 and the buffer size is 5000. For the Adam optimizer, epsilon is 1e-5. The skill representation network is a two-layer MLP with hidden size 64. The skill selector is also a two-layer MLP with hidden size 64. The skill-based policy learning is a two-layer MLP with hidden size 64. The GRU unit has hidden size 64.