On Reinforcement Learning for Full-Length Game of StarCraft
Authors: Zhen-Jia Pang, Ruo-Ze Liu, Zhou-Yu Meng, Yi Zhang, Yang Yu, Tong Lu4691-4698
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we investigate a set of techniques of reinforcement learning for the full-length game of Star Craft II. ...On a 64 64 map and using restrictive units, we train the agent on a single machine with 4 GPUs and 48 CPU threads. We achieve a winning rate of more than 99% against the difficulty level-1 built-in AI. Through the curriculum transfer learning algorithm and a mixture of combat model, we can achieve over 93% winning rate against the most difficult noncheating built-in AI (level-7) within days. ...Experimental results achieved in several difficult levels of full-length games on SC2LE illustrate the effectiveness of our method. |
| Researcher Affiliation | Academia | Zhen-Jia Pang, Ruo-Ze Liu, Zhou-Yu Meng, Yi Zhang, Yang Yu, Tong Lu National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China pangzj@lamda.nju.edu.cn, liuruoze@163.com, misskanagi@gmail.com, zhangyi@smail.nju.edu.cn, yuy@nju.edu.cn, lutong@nju.edu.cn |
| Pseudocode | Yes | Algorithm 1 HRL training algorithm |
| Open Source Code | No | The paper does not provide an explicit statement or link to the open-source code for the methodology described. |
| Open Datasets | Yes | SC2LE is a new research learning environment based on Star Craft II which is the follow-up of Star Craft. ...In this paper, we investigate a set of techniques of reinforcement learning for the full-length game of Star Craft II (SC2). ...SC2LE is a new research learning environment based on Star Craft II... (Vinyals et al. 2017) is cited in the references as the source for SC2LE. |
| Dataset Splits | No | The paper trains against different difficulty levels of built-in AI but does not specify explicit dataset splits (e.g., percentages or counts) for training, validation, or test sets. |
| Hardware Specification | No | On a single machine with 4 GPUs and 48 CPU threads. ...Moreover, we use less computing resources, which is a single machine with 4 GPUs and 48 CPU threads. This specifies quantities but not specific models or types of GPUs/CPUs. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software components or libraries used. |
| Experiment Setup | Yes | The setting of our architecture is as follow: controller selects one sub-policy every 8 seconds, and the sub-policy performs macro-actions every 1 second. ...We set the maximum length of each game to 15 minutes. ...The update algorithm we use is PPO (Schulman et al. 2017). Entropy s loss was added to the PPO s loss calculation to encourage exploration. Therefore, our loss formula is as follows: Lt(θ) = ˆEt[Lclip t (θ) + c1Lvf t (θ) + c2S[πθ](st)] ...The learning rate is mentioned in the context of stability. |