reproducibilityindex.ai

Dream to Control: Learning Behaviors by Latent Imagination

Authors: Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally evaluate Dreamer on a variety of control tasks. We designed the experiments to compare Dreamer to current best methods in the literature, and to evaluate its ability to solve tasks with long horizons, continuous actions, discrete actions, and early termination.
Researcher Affiliation	Collaboration	Danijar Hafner University of Toronto Google Brain Timothy Lillicrap Deep Mind Jimmy Ba University of Toronto Mohammad Norouzi Google Brain
Pseudocode	Yes	Algorithm 1: Dreamer
Open Source Code	Yes	The source code for all our experiments and videos of Dreamer are available at https://danijar.com/dreamer.
Open Datasets	Yes	We evaluate Dreamer on 20 visual control tasks of the Deep Mind Control Suite (Tassa et al., 2018), illustrated in Figure 2.
Dataset Splits	No	No explicit description of training/test/validation dataset splits with percentages, absolute counts, or references to predefined splits was found. The paper mentions training models but does not detail how data is partitioned for validation purposes.
Hardware Specification	Yes	We use a single Nvidia V100 GPU and 10 CPU cores for each training run.
Software Dependencies	No	The paper mentions 'Tensor Flow Probability' and 'Adam' but does not provide specific version numbers for these software dependencies, which are required for full reproducibility.
Experiment Setup	Yes	We draw batches of 50 sequences of length 50 to train the world model, value model, and action model models using Adam (Kingma and Ba, 2014) with learning rates 6 10 4, 8 10 5, 8 10 5, respectively and scale down gradient norms that exceed 100. We do not scale the KL regularizers (β = 1) but clip them below 3 free nats as in Pla Net. The imagination horizon is H = 15 and the same trajectories are used to update both action and value models. We compute the Vλ targets with γ = 0.99 and λ = 0.95.