Mastering Atari with Discrete World Models

Authors: Danijar Hafner, Timothy P Lillicrap, Mohammad Norouzi, Jimmy Ba

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Dreamer V2 on the well-established Atari benchmark with sticky actions, comparing to four strong model-free algorithms.
Researcher Affiliation Collaboration Danijar Hafner Google Research Timothy Lillicrap Deep Mind Mohammad Norouzi Google Research Jimmy Ba University of Toronto
Pseudocode Yes Algorithm 1: Straight-Through Gradients with Automatic Differentiation
Open Source Code Yes Refer to the project website for videos, the source code, and training curves in JSON format.1 https://danijar.com/dreamerv2
Open Datasets Yes We evaluate Dreamer V2 on the well-established Atari benchmark with sticky actions, comparing to four strong model-free algorithms.
Dataset Splits No No explicit mention of training/validation/test dataset splits with percentages or sample counts. The paper describes data generation through interaction with the Atari environment, not pre-split datasets.
Hardware Specification Yes Our implementation of Dreamer V2 reaches 200M environment steps in under 10 days, while using only a single NVIDIA V100 GPU and a single environment instance.
Software Dependencies No The paper mentions software components like "Adam optimizer" and "ELU activation function" but does not provide specific version numbers for any software libraries or frameworks used.
Experiment Setup Yes Table D1: Atari hyper parameters of Dreamer V2. When tuning the agent for a new task, we recommend searching over the KL loss scale β {0.1, 0.3, 1, 3}, actor entropy loss scale η {3e-5, 1e-4, 3e-4, 1e-3}, and the discount factor γ {0.99, 0.999}.