Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Grounded Reinforcement Learning: Learning to Win the Game under Human Commands
Authors: Shusheng Xu, Huaijie Wang, YI WU
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the policies derived by RED, BC and pure RL methods on a simplified real-time strategy game, Mini RTS. Experiment results and human studies show that the RED policy is able to consistently follow human commands and, at the same time, achieve a higher win rate than the baselines. |
| Researcher Affiliation | Academia | Shusheng Xu1, Huaijie Wang1 and Yi Wu1,2 1 IIIS, Tsinghua University, Beijing, China 2 Shanghai Qi Zhi Institute, Shanghai, China EMAIL EMAIL |
| Pseudocode | No | The paper describes its method in prose and mathematical equations but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | We release our code and present more examples at https://sites.google.com/view/grounded-rl. |
| Open Datasets | Yes | Mini RTS [33] is a grid-world RL environment (Fig. 1) that distills the key features of complex real-time strategy games. It has two parties, a player (blue) controlled by a human/policy against a built-in script AI (red). The player controls units to collect resources, do construction and kill all the enemy units or destroy the enemy base to win a game. |
| Dataset Splits | Yes | Our dataset consists of training, validation and test sets split from the original Mini RTS data by random sampling. Their sizes are 50000, 5000, and 5000 respectively. |
| Hardware Specification | Yes | All experiments are performed on a single GPU (NVIDIA RTX 2080 Ti) with 11GB memory. |
| Software Dependencies | Yes | The system is implemented in Python 3.7 with PyTorch 1.10.0 and CUDA 11.3. |
| Experiment Setup | Yes | We use Adam optimizer [30] with learning rate 3e-4, eps 1e-5 and weight decay 0.01 for training. The batch size for policy update is 512, and mini-batch size is 32. We set entropy coefficient to 0.005. During RL training, we rollout 50 games for each update epoch, and train for 500 epochs. |