Dream to Control: Learning Behaviors by Latent Imagination

Authors: Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally evaluate Dreamer on a variety of control tasks. We designed the experiments to compare Dreamer to current best methods in the literature, and to evaluate its ability to solve tasks with long horizons, continuous actions, discrete actions, and early termination.
Researcher Affiliation Collaboration Danijar Hafner University of Toronto Google Brain Timothy Lillicrap Deep Mind Jimmy Ba University of Toronto Mohammad Norouzi Google Brain
Pseudocode Yes Algorithm 1: Dreamer
Open Source Code Yes The source code for all our experiments and videos of Dreamer are available at https://danijar.com/dreamer.
Open Datasets Yes We evaluate Dreamer on 20 visual control tasks of the Deep Mind Control Suite (Tassa et al., 2018), illustrated in Figure 2.
Dataset Splits No No explicit description of training/test/validation dataset splits with percentages, absolute counts, or references to predefined splits was found. The paper mentions training models but does not detail how data is partitioned for validation purposes.
Hardware Specification Yes We use a single Nvidia V100 GPU and 10 CPU cores for each training run.
Software Dependencies No The paper mentions 'Tensor Flow Probability' and 'Adam' but does not provide specific version numbers for these software dependencies, which are required for full reproducibility.
Experiment Setup Yes We draw batches of 50 sequences of length 50 to train the world model, value model, and action model models using Adam (Kingma and Ba, 2014) with learning rates 6 10 4, 8 10 5, 8 10 5, respectively and scale down gradient norms that exceed 100. We do not scale the KL regularizers (β = 1) but clip them below 3 free nats as in Pla Net. The imagination horizon is H = 15 and the same trajectories are used to update both action and value models. We compute the Vλ targets with γ = 0.99 and λ = 0.95.