Deep Reinforcement Learning for Navigation in AAA Video Games
Authors: Eloi Alonso, Maxim Peter, David Goumard, Joshua Romoff
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test our approach on complex 3D environments that are notably an order of magnitude larger than maps typically used in the Deep RL literature. One of these environments is from a recently released AAA video game called Hyper Scape1. We find that our approach performs surprisingly well, achieving at least 90% success rate in a variety of scenarios using complex navigation abilities. |
| Researcher Affiliation | Collaboration | Eloi Alonso , Maxim Peter , David Goumard and Joshua Romoff Ubisoft La Forge eloi.alonso@unige.ch, {maxim.peter, david.goumard, joshua.romoff}@ubisoft.com |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. It links to videos demonstrating the results, but not to a code repository. |
| Open Datasets | No | The paper mentions training on 'Toy Map', 'Big Map' created in Unity, and a map from 'Hyper Scape', but these environments are not provided as publicly accessible datasets with concrete access information (links, DOIs, etc.). |
| Dataset Splits | No | The paper does not provide specific dataset split information (e.g., percentages or sample counts for training, validation, and test sets). For RL, data is generated through environment interaction, and a curriculum is used for training. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'Unity game engine', 'Soft Actor-Critic (SAC)', and 'Adam optimizer', but it does not provide specific version numbers for these or other software components. |
| Experiment Setup | Yes | During training, we spawn the agent and its goal in a cylinder with a variable radius. An episode is considered over when the agent has reached its goal or when the number of steps is over a certain budget. To allow the agent to have informative trajectories at all stages during training, we use a training curriculum [Bengio et al., 2009] and increase the radius of the spawning cylinder until the full map is covered. Specifically, when the agent achieves a success rate of > 80% over the last 200 episodes we increase the spawning radius of the goal. All the networks are trained using the Adam optimizer [Kingma and Ba, 2014]. More details on our hyperparameters can be found in the supplementary material, and Figure 3 describes our architecture. |