reproducibilityindex.ai

Towards Playing Full MOBA Games with Deep Reinforcement Learning

Authors: Deheng Ye, Guibin Chen, Wen Zhang, Sheng Chen, Bo Yuan, Bo Liu, Jia Chen, Zhao Liu, Fuhao Qiu, Hongsheng Yu, Yinyuting Yin, Bei Shi, Liang Wang, Tengfei Shi, Qiang Fu, Wei Yang, Lanxiao Huang, Wei Liu

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Tested on Honor of Kings, a popular MOBA game, we show how to build superhuman AI agents that can defeat top esports players. The superiority of our AI is demonstrated by the first large-scale performance test of MOBA AI agent in the literature.
Researcher Affiliation	Collaboration	1 Tencent AI Lab, Shenzhen, China 2 Tencent Ti Mi L1 Studio, Chengdu, China {dericye,beanchen,zivenwzhang,victchen,jerryyuan,leobliu,jaylahchen,ricardoliu, frankfhqiu,yannickyu,mailyyin,beishi,enginewang,francisshi,leonfu, willyang,jackiehuang}@tencent.com; wl2223@columbia.edu
Pseudocode	No	The paper describes algorithms such as Dual-clip PPO and Monte-Carlo Tree Search but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper provides links to game videos (https://sourl.cn/NVw V6L) but does not include any explicit statement or link for the source code of the described methodology.
Open Datasets	No	The paper mentions using a 'match dataset of 30 million samples' and 'vast amount of human player data' from Honor of Kings, but provides no specific link, DOI, repository name, or formal citation for public access to these datasets.
Dataset Splits	No	The paper describes generating training data and using a match dataset, but it does not provide specific details on explicit training, validation, and test dataset splits (e.g., percentages or sample counts) for the main reinforcement learning model.
Hardware Specification	No	The paper states the quantity of hardware used ('320 GPUs and 35,000 CPUs') but does not specify exact models or types (e.g., NVIDIA A100, Intel Xeon).
Software Dependencies	No	The paper mentions using Adam optimizer and Dual-clip PPO, but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	The mini-batch size per GPU card is 8192. ... LSTM unit sizes for teacher and final models are 512 and 1024, respectively. LSTM time step is 16 for all models. ... We use Adam [17] with initial learning rate 0.0001. For Dual-clip PPO, the two clipping hyperparameters ϵ and c are set as 0.2 and 3, respectively. The discount factor is set as 0.998. We use generalized advantage estimation (GAE) [27] for reward calculation, with λ = 0.95