Towards Playing Full MOBA Games with Deep Reinforcement Learning
Authors: Deheng Ye, Guibin Chen, Wen Zhang, Sheng Chen, Bo Yuan, Bo Liu, Jia Chen, Zhao Liu, Fuhao Qiu, Hongsheng Yu, Yinyuting Yin, Bei Shi, Liang Wang, Tengfei Shi, Qiang Fu, Wei Yang, Lanxiao Huang, Wei Liu
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Tested on Honor of Kings, a popular MOBA game, we show how to build superhuman AI agents that can defeat top esports players. The superiority of our AI is demonstrated by the first large-scale performance test of MOBA AI agent in the literature. |
| Researcher Affiliation | Collaboration | 1 Tencent AI Lab, Shenzhen, China 2 Tencent Ti Mi L1 Studio, Chengdu, China {dericye,beanchen,zivenwzhang,victchen,jerryyuan,leobliu,jaylahchen,ricardoliu, frankfhqiu,yannickyu,mailyyin,beishi,enginewang,francisshi,leonfu, willyang,jackiehuang}@tencent.com; wl2223@columbia.edu |
| Pseudocode | No | The paper describes algorithms such as Dual-clip PPO and Monte-Carlo Tree Search but does not include any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper provides links to game videos (https://sourl.cn/NVw V6L) but does not include any explicit statement or link for the source code of the described methodology. |
| Open Datasets | No | The paper mentions using a 'match dataset of 30 million samples' and 'vast amount of human player data' from Honor of Kings, but provides no specific link, DOI, repository name, or formal citation for public access to these datasets. |
| Dataset Splits | No | The paper describes generating training data and using a match dataset, but it does not provide specific details on explicit training, validation, and test dataset splits (e.g., percentages or sample counts) for the main reinforcement learning model. |
| Hardware Specification | No | The paper states the quantity of hardware used ('320 GPUs and 35,000 CPUs') but does not specify exact models or types (e.g., NVIDIA A100, Intel Xeon). |
| Software Dependencies | No | The paper mentions using Adam optimizer and Dual-clip PPO, but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | The mini-batch size per GPU card is 8192. ... LSTM unit sizes for teacher and final models are 512 and 1024, respectively. LSTM time step is 16 for all models. ... We use Adam [17] with initial learning rate 0.0001. For Dual-clip PPO, the two clipping hyperparameters ϵ and c are set as 0.2 and 3, respectively. The discount factor is set as 0.998. We use generalized advantage estimation (GAE) [27] for reward calculation, with λ = 0.95 |