Learning to Play Atari in a World of Tokens

Authors: Pranav Agarwal, Sheldon Andrews, Samira Ebrahimi Kahou

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental DART outperforms previous state-of-the-art methods that do not use look-ahead search on the Atari 100k sample efficiency benchmark with a median human-normalized score of 0.790 and beats humans in 9 out of 26 games.
Researcher Affiliation Collaboration 1 Ecole de Technologie Sup erieure, Canada 2Mila 3Roblox, USA 4University of Calgary, Canada 5Canada CIFAR AI Chair. Correspondence to: Pranav Agarwal <pranav.agarwal.1@ens.etsmtl.ca>.
Pseudocode Yes Algorithm 1 Integrating DART with lookahead search methods; Algorithm 2 Modeling DART for continuous action space.
Open Source Code Yes We release our code at https://pranaval.github.io/DART/.
Open Datasets Yes We evaluated our model alongside existing baselines using the Atari 100k benchmark (Ɓukasz Kaiser et al., 2020), a commonly used testbed for assessing the sample-efficiency of RL algorithms. It consists of 26 games from the Arcade Learning Environment (Bellemare et al., 2013), each with distinct settings requiring perception, planning, and control skills.
Dataset Splits No The paper mentions the Atari 100k benchmark but does not explicitly provide specific train/validation/test dataset splits (e.g., percentages or sample counts) for its experiments.
Hardware Specification No We are also thankful to the Digital Research Alliance of Canada for the computing resources and CIFAR for research funding.
Software Dependencies No The paper provides detailed hyperparameters for its models (Tables 6, 7, 8) but does not specify any software dependencies (e.g., libraries, frameworks) with version numbers.
Experiment Setup Yes The world model is trained with a GPT-style causal (decoder) transformer, while the policy is trained using a Vi T-style (encoder) transformer. A detailed list of hyperparameters is provided for each module: Table 6 for Image Tokenizer, Table 7 for World Modeling, and Table 8 for behaviour learning. These tables list specific values like "Embedding dimension 512", "Batch size 64", "Learning rate 0.0001", "Transformer layers 6", etc.