Grounded Reinforcement Learning: Learning to Win the Game under Human Commands

Authors: Shusheng Xu, Huaijie Wang, YI WU

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the policies derived by RED, BC and pure RL methods on a simplified real-time strategy game, Mini RTS. Experiment results and human studies show that the RED policy is able to consistently follow human commands and, at the same time, achieve a higher win rate than the baselines.
Researcher Affiliation Academia Shusheng Xu1, Huaijie Wang1 and Yi Wu1,2 1 IIIS, Tsinghua University, Beijing, China 2 Shanghai Qi Zhi Institute, Shanghai, China {xuss20, wanghuai19}@mails.tsinghua.edu.cn jxwuyi@gmail.com
Pseudocode No The paper describes its method in prose and mathematical equations but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes We release our code and present more examples at https://sites.google.com/view/grounded-rl.
Open Datasets Yes Mini RTS [33] is a grid-world RL environment (Fig. 1) that distills the key features of complex real-time strategy games. It has two parties, a player (blue) controlled by a human/policy against a built-in script AI (red). The player controls units to collect resources, do construction and kill all the enemy units or destroy the enemy base to win a game.
Dataset Splits Yes Our dataset consists of training, validation and test sets split from the original Mini RTS data by random sampling. Their sizes are 50000, 5000, and 5000 respectively.
Hardware Specification Yes All experiments are performed on a single GPU (NVIDIA RTX 2080 Ti) with 11GB memory.
Software Dependencies Yes The system is implemented in Python 3.7 with PyTorch 1.10.0 and CUDA 11.3.
Experiment Setup Yes We use Adam optimizer [30] with learning rate 3e-4, eps 1e-5 and weight decay 0.01 for training. The batch size for policy update is 512, and mini-batch size is 32. We set entropy coefficient to 0.005. During RL training, we rollout 50 games for each update epoch, and train for 500 epochs.