Deep Hierarchical Planning from Pixels

Authors: Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Director on two challenging benchmark suites with visual inputs and very sparse rewards, which we expect to be challenging to solve using a flat policy without hierarchy (Section 3.1). We further evaluate Director on a wide range of standard tasks from the literature to demonstrate its generality and ensure that the hierarchy is not harmful in simple settings (Section 3.2).
Researcher Affiliation Collaboration 1UC Berkeley 2Google Research 3University of Toronto 4Covariant
Pseudocode Yes For the pseudo code of Director, refer to Appendix E.
Open Source Code Yes Project website with videos and code: https://danijar.com/director All our agents and environments will be open sourced upon publication to facilitate future research in hierarchical reinforcement learning.
Open Datasets Yes We choose Atari games (Bellemare et al., 2013), the Control Suite from pixels (Tassa et al., 2018), Crafter (Hafner, 2021), and tasks from DMLab (Beattie et al., 2016) to cover a spectrum of challenges, including continuous and discrete actions and 2D and 3D environments.
Dataset Splits No The paper does not explicitly provide numerical training, validation, or test dataset splits (e.g., 80/10/10%). It mentions that the world model is trained from a replay buffer and policies from imagined rollouts, and discusses evaluation on benchmarks, but no specific dataset partitioning details for reproducibility are given.
Hardware Specification Yes Each training run used a single V100 GPU with XLA and mixed precision enabled and completed in less than 24 hours.
Software Dependencies No We implemented Director on top of the public source code of Dreamer V2 (Hafner et al., 2020a), reusing its default hyperparameters. The paper mentions "Dreamer V2" but does not specify a version number or list other software dependencies with their respective version numbers.
Experiment Setup Yes We use a fixed set of hyperparameters not only across tasks but also across domains, detailed in Table F.1.