Deep Hierarchical Planning from Pixels
Authors: Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Director on two challenging benchmark suites with visual inputs and very sparse rewards, which we expect to be challenging to solve using a flat policy without hierarchy (Section 3.1). We further evaluate Director on a wide range of standard tasks from the literature to demonstrate its generality and ensure that the hierarchy is not harmful in simple settings (Section 3.2). |
| Researcher Affiliation | Collaboration | 1UC Berkeley 2Google Research 3University of Toronto 4Covariant |
| Pseudocode | Yes | For the pseudo code of Director, refer to Appendix E. |
| Open Source Code | Yes | Project website with videos and code: https://danijar.com/director All our agents and environments will be open sourced upon publication to facilitate future research in hierarchical reinforcement learning. |
| Open Datasets | Yes | We choose Atari games (Bellemare et al., 2013), the Control Suite from pixels (Tassa et al., 2018), Crafter (Hafner, 2021), and tasks from DMLab (Beattie et al., 2016) to cover a spectrum of challenges, including continuous and discrete actions and 2D and 3D environments. |
| Dataset Splits | No | The paper does not explicitly provide numerical training, validation, or test dataset splits (e.g., 80/10/10%). It mentions that the world model is trained from a replay buffer and policies from imagined rollouts, and discusses evaluation on benchmarks, but no specific dataset partitioning details for reproducibility are given. |
| Hardware Specification | Yes | Each training run used a single V100 GPU with XLA and mixed precision enabled and completed in less than 24 hours. |
| Software Dependencies | No | We implemented Director on top of the public source code of Dreamer V2 (Hafner et al., 2020a), reusing its default hyperparameters. The paper mentions "Dreamer V2" but does not specify a version number or list other software dependencies with their respective version numbers. |
| Experiment Setup | Yes | We use a fixed set of hyperparameters not only across tasks but also across domains, detailed in Table F.1. |