Deep Reinforcement Learning for Navigation in AAA Video Games

Authors: Eloi Alonso, Maxim Peter, David Goumard, Joshua Romoff

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test our approach on complex 3D environments that are notably an order of magnitude larger than maps typically used in the Deep RL literature. One of these environments is from a recently released AAA video game called Hyper Scape1. We find that our approach performs surprisingly well, achieving at least 90% success rate in a variety of scenarios using complex navigation abilities.
Researcher Affiliation Collaboration Eloi Alonso , Maxim Peter , David Goumard and Joshua Romoff Ubisoft La Forge eloi.alonso@unige.ch, {maxim.peter, david.goumard, joshua.romoff}@ubisoft.com
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described. It links to videos demonstrating the results, but not to a code repository.
Open Datasets No The paper mentions training on 'Toy Map', 'Big Map' created in Unity, and a map from 'Hyper Scape', but these environments are not provided as publicly accessible datasets with concrete access information (links, DOIs, etc.).
Dataset Splits No The paper does not provide specific dataset split information (e.g., percentages or sample counts for training, validation, and test sets). For RL, data is generated through environment interaction, and a curriculum is used for training.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions using 'Unity game engine', 'Soft Actor-Critic (SAC)', and 'Adam optimizer', but it does not provide specific version numbers for these or other software components.
Experiment Setup Yes During training, we spawn the agent and its goal in a cylinder with a variable radius. An episode is considered over when the agent has reached its goal or when the number of steps is over a certain budget. To allow the agent to have informative trajectories at all stages during training, we use a training curriculum [Bengio et al., 2009] and increase the radius of the spawning cylinder until the full map is covered. Specifically, when the agent achieves a success rate of > 80% over the last 200 episodes we increase the spawning radius of the goal. All the networks are trained using the Adam optimizer [Kingma and Ba, 2014]. More details on our hyperparameters can be found in the supplementary material, and Figure 3 describes our architecture.