Dream to Control: Learning Behaviors by Latent Imagination
Authors: Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally evaluate Dreamer on a variety of control tasks. We designed the experiments to compare Dreamer to current best methods in the literature, and to evaluate its ability to solve tasks with long horizons, continuous actions, discrete actions, and early termination. |
| Researcher Affiliation | Collaboration | Danijar Hafner University of Toronto Google Brain Timothy Lillicrap Deep Mind Jimmy Ba University of Toronto Mohammad Norouzi Google Brain |
| Pseudocode | Yes | Algorithm 1: Dreamer |
| Open Source Code | Yes | The source code for all our experiments and videos of Dreamer are available at https://danijar.com/dreamer. |
| Open Datasets | Yes | We evaluate Dreamer on 20 visual control tasks of the Deep Mind Control Suite (Tassa et al., 2018), illustrated in Figure 2. |
| Dataset Splits | No | No explicit description of training/test/validation dataset splits with percentages, absolute counts, or references to predefined splits was found. The paper mentions training models but does not detail how data is partitioned for validation purposes. |
| Hardware Specification | Yes | We use a single Nvidia V100 GPU and 10 CPU cores for each training run. |
| Software Dependencies | No | The paper mentions 'Tensor Flow Probability' and 'Adam' but does not provide specific version numbers for these software dependencies, which are required for full reproducibility. |
| Experiment Setup | Yes | We draw batches of 50 sequences of length 50 to train the world model, value model, and action model models using Adam (Kingma and Ba, 2014) with learning rates 6 10 4, 8 10 5, 8 10 5, respectively and scale down gradient norms that exceed 100. We do not scale the KL regularizers (β = 1) but clip them below 3 free nats as in Pla Net. The imagination horizon is H = 15 and the same trajectories are used to update both action and value models. We compute the Vλ targets with γ = 0.99 and λ = 0.95. |