Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning
Authors: Arthur Juliani, Ahmed Khalifa, Vincent-Pierre Berges, Jonathan Harper, Ervin Teng, Hunter Henry, Adam Crespi, Julian Togelius, Danny Lange
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper we outline the environment and provide a set of baseline results produced by current state-of-the-art Deep RL methods as well as human players. These algorithms fail to produce agents capable of performing near human level. We conducted evaluations of the environment as well as agent and human performance within the environment. We evaluated human and agent performance within three distinct conditions, each designed to provide insight into the level of generalization ability that the human or agent possesses. |
| Researcher Affiliation | Collaboration | 1Unity Technologies 2New York University |
| Pseudocode | No | The paper describes procedural generation using graph grammars and shape grammars for floor layouts, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | 1https://github.com/Unity-Technologies/obstacle-tower-env |
| Open Datasets | No | Agents should be trained on a fixed set of 100 seeds for the environment configurations. |
| Dataset Splits | No | Agents should be trained on a fixed set of 100 seeds for the environment configurations. They should then be tested on a held-out set of five randomly selected tower configuration seeds not in the training set. |
| Hardware Specification | Yes | Table 1: Environment performance metrics on n1-highmem-2 GCP instance with NVIDIA Tesla K80. |
| Software Dependencies | No | The Obstacle Tower environment uses the Unity platform and ML-Agents Toolkit [Juliani et al., 2018]. It can run on the Mac, Windows, and Linux platforms, and can be controlled via the Open AI Gym interface for easy integration with existing Deep RL training frameworks [Brockman et al., 2016]. In particular we utilized the Open AI Baseline implementation of Proximal Policy Optimization (PPO) [Schulman et al., 2017; Dhariwal et al., 2017] as well as the implementation of Rainbow provided by the Dopamine library [Hessel et al., 2018; Castro et al., 2018]. |
| Experiment Setup | Yes | We utilized the default hyperparameters provided by each library for use with Atari benchmarks, in order to provide comparable results with evaluations performed on the ALE. We collected data in PPO using 50 concurrently running environments. In the case of Rainbow we collect data from a single environment running serially. We conducted training sessions spanning 20 million environment steps for PPO and Rainbow. |