reproducibilityindex.ai

Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning

Authors: Arthur Juliani, Ahmed Khalifa, Vincent-Pierre Berges, Jonathan Harper, Ervin Teng, Hunter Henry, Adam Crespi, Julian Togelius, Danny Lange

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper we outline the environment and provide a set of baseline results produced by current state-of-the-art Deep RL methods as well as human players. These algorithms fail to produce agents capable of performing near human level. We conducted evaluations of the environment as well as agent and human performance within the environment. We evaluated human and agent performance within three distinct conditions, each designed to provide insight into the level of generalization ability that the human or agent possesses.
Researcher Affiliation	Collaboration	1Unity Technologies 2New York University
Pseudocode	No	The paper describes procedural generation using graph grammars and shape grammars for floor layouts, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	1https://github.com/Unity-Technologies/obstacle-tower-env
Open Datasets	No	Agents should be trained on a ﬁxed set of 100 seeds for the environment conﬁgurations.
Dataset Splits	No	Agents should be trained on a ﬁxed set of 100 seeds for the environment conﬁgurations. They should then be tested on a held-out set of ﬁve randomly selected tower conﬁguration seeds not in the training set.
Hardware Specification	Yes	Table 1: Environment performance metrics on n1-highmem-2 GCP instance with NVIDIA Tesla K80.
Software Dependencies	No	The Obstacle Tower environment uses the Unity platform and ML-Agents Toolkit [Juliani et al., 2018]. It can run on the Mac, Windows, and Linux platforms, and can be controlled via the Open AI Gym interface for easy integration with existing Deep RL training frameworks [Brockman et al., 2016]. In particular we utilized the Open AI Baseline implementation of Proximal Policy Optimization (PPO) [Schulman et al., 2017; Dhariwal et al., 2017] as well as the implementation of Rainbow provided by the Dopamine library [Hessel et al., 2018; Castro et al., 2018].
Experiment Setup	Yes	We utilized the default hyperparameters provided by each library for use with Atari benchmarks, in order to provide comparable results with evaluations performed on the ALE. We collected data in PPO using 50 concurrently running environments. In the case of Rainbow we collect data from a single environment running serially. We conducted training sessions spanning 20 million environment steps for PPO and Rainbow.