Mastering Atari with Discrete World Models
Authors: Danijar Hafner, Timothy P Lillicrap, Mohammad Norouzi, Jimmy Ba
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Dreamer V2 on the well-established Atari benchmark with sticky actions, comparing to four strong model-free algorithms. |
| Researcher Affiliation | Collaboration | Danijar Hafner Google Research Timothy Lillicrap Deep Mind Mohammad Norouzi Google Research Jimmy Ba University of Toronto |
| Pseudocode | Yes | Algorithm 1: Straight-Through Gradients with Automatic Differentiation |
| Open Source Code | Yes | Refer to the project website for videos, the source code, and training curves in JSON format.1 https://danijar.com/dreamerv2 |
| Open Datasets | Yes | We evaluate Dreamer V2 on the well-established Atari benchmark with sticky actions, comparing to four strong model-free algorithms. |
| Dataset Splits | No | No explicit mention of training/validation/test dataset splits with percentages or sample counts. The paper describes data generation through interaction with the Atari environment, not pre-split datasets. |
| Hardware Specification | Yes | Our implementation of Dreamer V2 reaches 200M environment steps in under 10 days, while using only a single NVIDIA V100 GPU and a single environment instance. |
| Software Dependencies | No | The paper mentions software components like "Adam optimizer" and "ELU activation function" but does not provide specific version numbers for any software libraries or frameworks used. |
| Experiment Setup | Yes | Table D1: Atari hyper parameters of Dreamer V2. When tuning the agent for a new task, we recommend searching over the KL loss scale β {0.1, 0.3, 1, 3}, actor entropy loss scale η {3e-5, 1e-4, 3e-4, 1e-3}, and the discount factor γ {0.99, 0.999}. |