reproducibilityindex.ai

On Reinforcement Learning for Full-Length Game of StarCraft

Authors: Zhen-Jia Pang, Ruo-Ze Liu, Zhou-Yu Meng, Yi Zhang, Yang Yu, Tong Lu4691-4698

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we investigate a set of techniques of reinforcement learning for the full-length game of Star Craft II. ...On a 64 64 map and using restrictive units, we train the agent on a single machine with 4 GPUs and 48 CPU threads. We achieve a winning rate of more than 99% against the difﬁculty level-1 built-in AI. Through the curriculum transfer learning algorithm and a mixture of combat model, we can achieve over 93% winning rate against the most difﬁcult noncheating built-in AI (level-7) within days. ...Experimental results achieved in several difﬁcult levels of full-length games on SC2LE illustrate the effectiveness of our method.
Researcher Affiliation	Academia	Zhen-Jia Pang, Ruo-Ze Liu, Zhou-Yu Meng, Yi Zhang, Yang Yu, Tong Lu National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China pangzj@lamda.nju.edu.cn, liuruoze@163.com, misskanagi@gmail.com, zhangyi@smail.nju.edu.cn, yuy@nju.edu.cn, lutong@nju.edu.cn
Pseudocode	Yes	Algorithm 1 HRL training algorithm
Open Source Code	No	The paper does not provide an explicit statement or link to the open-source code for the methodology described.
Open Datasets	Yes	SC2LE is a new research learning environment based on Star Craft II which is the follow-up of Star Craft. ...In this paper, we investigate a set of techniques of reinforcement learning for the full-length game of Star Craft II (SC2). ...SC2LE is a new research learning environment based on Star Craft II... (Vinyals et al. 2017) is cited in the references as the source for SC2LE.
Dataset Splits	No	The paper trains against different difficulty levels of built-in AI but does not specify explicit dataset splits (e.g., percentages or counts) for training, validation, or test sets.
Hardware Specification	No	On a single machine with 4 GPUs and 48 CPU threads. ...Moreover, we use less computing resources, which is a single machine with 4 GPUs and 48 CPU threads. This specifies quantities but not specific models or types of GPUs/CPUs.
Software Dependencies	No	The paper does not provide specific version numbers for any software components or libraries used.
Experiment Setup	Yes	The setting of our architecture is as follow: controller selects one sub-policy every 8 seconds, and the sub-policy performs macro-actions every 1 second. ...We set the maximum length of each game to 15 minutes. ...The update algorithm we use is PPO (Schulman et al. 2017). Entropy s loss was added to the PPO s loss calculation to encourage exploration. Therefore, our loss formula is as follows: Lt(θ) = ˆEt[Lclip t (θ) + c1Lvf t (θ) + c2S[πθ](st)] ...The learning rate is mentioned in the context of stability.