Learning Diverse Policies in MOBA Games via Macro-Goals
Authors: Yiming Gao, Bei Shi, Xueying Du, Liang Wang, Guangwei Chen, Zhenjie Lian, Fuhao Qiu, GUOAN HAN, Weixuan Wang, Deheng Ye, Qiang Fu, Wei Yang, Lanxiao Huang
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on the typical MOBA game Honor of Kings demonstrate that MGG can execute diverse policies in different matches and lineups, and also outperform the state-of-the-art methods over 102 heroes. |
| Researcher Affiliation | Industry | 1Tencent AI Lab, Shenzhen, China 2Tencent Ti Mi L1 Studio, Chengdu, China {yatminggao,beishi,sherinedu,enginewang,gorvinchen, leolian,frankfhqiu,guoanhan,waihinwang,dericye, leonfu,willyang,jackiehuang}@tencent.com |
| Pseudocode | No | The paper includes figures illustrating the framework and network architecture (Figure 1, 4, 5) but does not contain any formal pseudocode or algorithm blocks with structured, code-like steps. |
| Open Source Code | No | The paper does not include an unambiguous statement about releasing source code for the methodology, nor does it provide a direct link to a code repository. |
| Open Datasets | No | We construct a training dataset by collecting replays from the top 1% human players to train the Meta-Controller. The paper does not provide a link, DOI, or specific repository name for this custom-collected dataset, nor does it cite a published paper that contains the dataset with proper bibliographic information. |
| Dataset Splits | No | The paper mentions 'training dataset' and training for a certain number of hours, but it does not specify any explicit training/validation/test dataset splits, percentages, or absolute sample counts for data partitioning. |
| Hardware Specification | Yes | We use 8 NVIDIA P40 GPUs for about 26 hours of training, and the batch size of each GPU is set to 512. MGG and other RL methods adopt self-play training and train by randomly selecting heroes over a physical computer cluster with 60,000 CPUs and 830 NVIDIA V100 GPUS. |
| Software Dependencies | No | The paper mentions using 'Adam Kingma and Ba [2014]' as the optimizer but does not specify any other software components (e.g., programming languages, libraries, frameworks) with specific version numbers. |
| Experiment Setup | Yes | We set α = 0.75, γ = 2 for focal loss LF L Lin et al. [2017], and set λ = 1 for the weight of auxiliary task. We use Adam with the initial learning rate of 0.0001. ...the batch size of each GPU is set to 512...The batch size of each GPU is set to 4096. ...the delta C is 30 seconds and the noise ϵ is 3 seconds. |