Google Research Football: A Novel Reinforcement Learning Environment

Authors: Karol Kurach, Anton Raichuk, Piotr Stańczyk, Michał Zając, Olivier Bachem, Lasse Espeholt, Carlos Riquelme, Damien Vincent, Marcin Michalski, Olivier Bousquet, Sylvain Gelly4501-4510

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We introduce the Google Research Football Environment, a new reinforcement learning environment where agents are trained to play football in an advanced, physics-based 3D simulator. The resulting environment is challenging, easy to use and customize, and it is available under a permissive open-source license. In addition, it provides support for multiplayer and multi-agent experiments. We propose three full-game scenarios of varying difficulty with the Football Benchmarks and report baseline results for three commonly used reinforcement algorithms (IMPALA, PPO, and Ape-X DQN). We also provide a diverse set of simpler scenarios with the Football Academy and showcase several promising research directions.
Researcher Affiliation Collaboration Karol Kurach, Anton Raichuk, Piotr Sta nczyk, Michał Zaj ac Olivier Bachem Lasse Espeholt Carlos Riquelme Damien Vincent Marcin Michalski Olivier Bousquet Sylvain Gelly Google Research, Brain Team Indicates equal authorship. Correspondence to Karol Kurach (kkurach@google.com). Student at Jagiellonian University, work done during internship at Google Brain.
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code Yes We introduce the Google Research Football Environment, a new reinforcement learning environment where agents are trained to play football in an advanced, physics-based 3D simulator. The resulting environment is challenging, easy to use and customize, and it is available under a permissive open-source license. Figure 1: The Google Research Football Environment (github.com/google-research/football)
Open Datasets Yes We introduce the Google Research Football Environment, a new reinforcement learning environment where agents are trained to play football in an advanced, physics-based 3D simulator. The resulting environment is challenging, easy to use and customize, and it is available under a permissive open-source license. We propose three full-game scenarios of varying difficulty with the Football Benchmarks and report baseline results for three commonly used reinforcement algorithms (IMPALA, PPO, and Ape-X DQN). We also provide a diverse set of simpler scenarios with the Football Academy and showcase several promising research directions.
Dataset Splits Yes The tuning of hyper-parameters is done using easy scenario, and we follow the same protocol for all algorithms to ensure fairness of comparison. After tuning, for each of the six considered settings (three Football Benchmarks and two reward functions), we run five random seeds and average the results.
Hardware Specification Yes Technical Implementation & Performance. The Football Engine is written in highly optimized C++ code, allowing it to be run on commodity machines both with GPU and without GPU-based rendering enabled. This allows it to obtain a performance of approximately 140 million steps per day on a single hexacore machine (see Figure 3). Figure 3: Number of steps per day versus number of concurrent environments for the Football Engine on a hexa-core Intel Xeon W-2135 CPU with 3.70GHz.
Software Dependencies No The paper mentions "Open AI Gym API" but does not specify its version number or any other software dependencies with their version numbers required for reproducibility.
Experiment Setup Yes Experimental Setup As a reference, we provide benchmark results for three stateof-the-art reinforcement learning algorithms: PPO (Schulman et al. 2017) and IMPALA (Espeholt et al. 2018) which are popular policy gradient methods, and Ape-X DQN (Horgan et al. 2018), which is a modern DQN implementation. We run PPO in multiple processes on a single machine, while IMPALA and DQN are run on a distributed cluster with 500 and 150 actors respectively. In all benchmark experiments, we use the stacked Super Mini Map representation and the same network architecture. We consider both the SCORING and CHECKPOINT rewards. The tuning of hyper-parameters is done using easy scenario, and we follow the same protocol for all algorithms to ensure fairness of comparison. After tuning, for each of the six considered settings (three Football Benchmarks and two reward functions), we run five random seeds and average the results. For the technical details of the training setup and the used architecture and hyperparameters, we refer to the Appendix.