Grounded Reinforcement Learning: Learning to Win the Game under Human Commands
Authors: Shusheng Xu, Huaijie Wang, YI WU
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the policies derived by RED, BC and pure RL methods on a simplified real-time strategy game, Mini RTS. Experiment results and human studies show that the RED policy is able to consistently follow human commands and, at the same time, achieve a higher win rate than the baselines. |
| Researcher Affiliation | Academia | Shusheng Xu1, Huaijie Wang1 and Yi Wu1,2 1 IIIS, Tsinghua University, Beijing, China 2 Shanghai Qi Zhi Institute, Shanghai, China {xuss20, wanghuai19}@mails.tsinghua.edu.cn jxwuyi@gmail.com |
| Pseudocode | No | The paper describes its method in prose and mathematical equations but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | We release our code and present more examples at https://sites.google.com/view/grounded-rl. |
| Open Datasets | Yes | Mini RTS [33] is a grid-world RL environment (Fig. 1) that distills the key features of complex real-time strategy games. It has two parties, a player (blue) controlled by a human/policy against a built-in script AI (red). The player controls units to collect resources, do construction and kill all the enemy units or destroy the enemy base to win a game. |
| Dataset Splits | Yes | Our dataset consists of training, validation and test sets split from the original Mini RTS data by random sampling. Their sizes are 50000, 5000, and 5000 respectively. |
| Hardware Specification | Yes | All experiments are performed on a single GPU (NVIDIA RTX 2080 Ti) with 11GB memory. |
| Software Dependencies | Yes | The system is implemented in Python 3.7 with PyTorch 1.10.0 and CUDA 11.3. |
| Experiment Setup | Yes | We use Adam optimizer [30] with learning rate 3e-4, eps 1e-5 and weight decay 0.01 for training. The batch size for policy update is 512, and mini-batch size is 32. We set entropy coefficient to 0.005. During RL training, we rollout 50 games for each update epoch, and train for 500 epochs. |