Learning Diverse Policies in MOBA Games via Macro-Goals

Authors: Yiming Gao, Bei Shi, Xueying Du, Liang Wang, Guangwei Chen, Zhenjie Lian, Fuhao Qiu, GUOAN HAN, Weixuan Wang, Deheng Ye, Qiang Fu, Wei Yang, Lanxiao Huang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on the typical MOBA game Honor of Kings demonstrate that MGG can execute diverse policies in different matches and lineups, and also outperform the state-of-the-art methods over 102 heroes.
Researcher Affiliation Industry 1Tencent AI Lab, Shenzhen, China 2Tencent Ti Mi L1 Studio, Chengdu, China {yatminggao,beishi,sherinedu,enginewang,gorvinchen, leolian,frankfhqiu,guoanhan,waihinwang,dericye, leonfu,willyang,jackiehuang}@tencent.com
Pseudocode No The paper includes figures illustrating the framework and network architecture (Figure 1, 4, 5) but does not contain any formal pseudocode or algorithm blocks with structured, code-like steps.
Open Source Code No The paper does not include an unambiguous statement about releasing source code for the methodology, nor does it provide a direct link to a code repository.
Open Datasets No We construct a training dataset by collecting replays from the top 1% human players to train the Meta-Controller. The paper does not provide a link, DOI, or specific repository name for this custom-collected dataset, nor does it cite a published paper that contains the dataset with proper bibliographic information.
Dataset Splits No The paper mentions 'training dataset' and training for a certain number of hours, but it does not specify any explicit training/validation/test dataset splits, percentages, or absolute sample counts for data partitioning.
Hardware Specification Yes We use 8 NVIDIA P40 GPUs for about 26 hours of training, and the batch size of each GPU is set to 512. MGG and other RL methods adopt self-play training and train by randomly selecting heroes over a physical computer cluster with 60,000 CPUs and 830 NVIDIA V100 GPUS.
Software Dependencies No The paper mentions using 'Adam Kingma and Ba [2014]' as the optimizer but does not specify any other software components (e.g., programming languages, libraries, frameworks) with specific version numbers.
Experiment Setup Yes We set α = 0.75, γ = 2 for focal loss LF L Lin et al. [2017], and set λ = 1 for the weight of auxiliary task. We use Adam with the initial learning rate of 0.0001. ...the batch size of each GPU is set to 512...The batch size of each GPU is set to 4096. ...the delta C is 30 seconds and the noise ϵ is 3 seconds.