Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning

Authors: Arthur Juliani, Ahmed Khalifa, Vincent-Pierre Berges, Jonathan Harper, Ervin Teng, Hunter Henry, Adam Crespi, Julian Togelius, Danny Lange

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper we outline the environment and provide a set of baseline results produced by current state-of-the-art Deep RL methods as well as human players. These algorithms fail to produce agents capable of performing near human level. We conducted evaluations of the environment as well as agent and human performance within the environment. We evaluated human and agent performance within three distinct conditions, each designed to provide insight into the level of generalization ability that the human or agent possesses.
Researcher Affiliation Collaboration 1Unity Technologies 2New York University
Pseudocode No The paper describes procedural generation using graph grammars and shape grammars for floor layouts, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes 1https://github.com/Unity-Technologies/obstacle-tower-env
Open Datasets No Agents should be trained on a fixed set of 100 seeds for the environment configurations.
Dataset Splits No Agents should be trained on a fixed set of 100 seeds for the environment configurations. They should then be tested on a held-out set of five randomly selected tower configuration seeds not in the training set.
Hardware Specification Yes Table 1: Environment performance metrics on n1-highmem-2 GCP instance with NVIDIA Tesla K80.
Software Dependencies No The Obstacle Tower environment uses the Unity platform and ML-Agents Toolkit [Juliani et al., 2018]. It can run on the Mac, Windows, and Linux platforms, and can be controlled via the Open AI Gym interface for easy integration with existing Deep RL training frameworks [Brockman et al., 2016]. In particular we utilized the Open AI Baseline implementation of Proximal Policy Optimization (PPO) [Schulman et al., 2017; Dhariwal et al., 2017] as well as the implementation of Rainbow provided by the Dopamine library [Hessel et al., 2018; Castro et al., 2018].
Experiment Setup Yes We utilized the default hyperparameters provided by each library for use with Atari benchmarks, in order to provide comparable results with evaluations performed on the ALE. We collected data in PPO using 50 concurrently running environments. In the case of Rainbow we collect data from a single environment running serially. We conducted training sessions spanning 20 million environment steps for PPO and Rainbow.