reproducibilityindex.ai

Grounded Reinforcement Learning: Learning to Win the Game under Human Commands

Authors: Shusheng Xu, Huaijie Wang, YI WU

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the policies derived by RED, BC and pure RL methods on a simplified real-time strategy game, Mini RTS. Experiment results and human studies show that the RED policy is able to consistently follow human commands and, at the same time, achieve a higher win rate than the baselines.
Researcher Affiliation	Academia	Shusheng Xu1, Huaijie Wang1 and Yi Wu1,2 1 IIIS, Tsinghua University, Beijing, China 2 Shanghai Qi Zhi Institute, Shanghai, China {xuss20, wanghuai19}@mails.tsinghua.edu.cn jxwuyi@gmail.com
Pseudocode	No	The paper describes its method in prose and mathematical equations but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	We release our code and present more examples at https://sites.google.com/view/grounded-rl.
Open Datasets	Yes	Mini RTS [33] is a grid-world RL environment (Fig. 1) that distills the key features of complex real-time strategy games. It has two parties, a player (blue) controlled by a human/policy against a built-in script AI (red). The player controls units to collect resources, do construction and kill all the enemy units or destroy the enemy base to win a game.
Dataset Splits	Yes	Our dataset consists of training, validation and test sets split from the original Mini RTS data by random sampling. Their sizes are 50000, 5000, and 5000 respectively.
Hardware Specification	Yes	All experiments are performed on a single GPU (NVIDIA RTX 2080 Ti) with 11GB memory.
Software Dependencies	Yes	The system is implemented in Python 3.7 with PyTorch 1.10.0 and CUDA 11.3.
Experiment Setup	Yes	We use Adam optimizer [30] with learning rate 3e-4, eps 1e-5 and weight decay 0.01 for training. The batch size for policy update is 512, and mini-batch size is 32. We set entropy coefficient to 0.005. During RL training, we rollout 50 games for each update epoch, and train for 500 epochs.