On Reinforcement Learning for Full-Length Game of StarCraft

Authors: Zhen-Jia Pang, Ruo-Ze Liu, Zhou-Yu Meng, Yi Zhang, Yang Yu, Tong Lu4691-4698

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we investigate a set of techniques of reinforcement learning for the full-length game of Star Craft II. ...On a 64 64 map and using restrictive units, we train the agent on a single machine with 4 GPUs and 48 CPU threads. We achieve a winning rate of more than 99% against the difficulty level-1 built-in AI. Through the curriculum transfer learning algorithm and a mixture of combat model, we can achieve over 93% winning rate against the most difficult noncheating built-in AI (level-7) within days. ...Experimental results achieved in several difficult levels of full-length games on SC2LE illustrate the effectiveness of our method.
Researcher Affiliation Academia Zhen-Jia Pang, Ruo-Ze Liu, Zhou-Yu Meng, Yi Zhang, Yang Yu, Tong Lu National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China pangzj@lamda.nju.edu.cn, liuruoze@163.com, misskanagi@gmail.com, zhangyi@smail.nju.edu.cn, yuy@nju.edu.cn, lutong@nju.edu.cn
Pseudocode Yes Algorithm 1 HRL training algorithm
Open Source Code No The paper does not provide an explicit statement or link to the open-source code for the methodology described.
Open Datasets Yes SC2LE is a new research learning environment based on Star Craft II which is the follow-up of Star Craft. ...In this paper, we investigate a set of techniques of reinforcement learning for the full-length game of Star Craft II (SC2). ...SC2LE is a new research learning environment based on Star Craft II... (Vinyals et al. 2017) is cited in the references as the source for SC2LE.
Dataset Splits No The paper trains against different difficulty levels of built-in AI but does not specify explicit dataset splits (e.g., percentages or counts) for training, validation, or test sets.
Hardware Specification No On a single machine with 4 GPUs and 48 CPU threads. ...Moreover, we use less computing resources, which is a single machine with 4 GPUs and 48 CPU threads. This specifies quantities but not specific models or types of GPUs/CPUs.
Software Dependencies No The paper does not provide specific version numbers for any software components or libraries used.
Experiment Setup Yes The setting of our architecture is as follow: controller selects one sub-policy every 8 seconds, and the sub-policy performs macro-actions every 1 second. ...We set the maximum length of each game to 15 minutes. ...The update algorithm we use is PPO (Schulman et al. 2017). Entropy s loss was added to the PPO s loss calculation to encourage exploration. Therefore, our loss formula is as follows: Lt(θ) = ˆEt[Lclip t (θ) + c1Lvf t (θ) + c2S[πθ](st)] ...The learning rate is mentioned in the context of stability.