reproducibilityindex.ai

Learning to Play Atari in a World of Tokens

Authors: Pranav Agarwal, Sheldon Andrews, Samira Ebrahimi Kahou

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	DART outperforms previous state-of-the-art methods that do not use look-ahead search on the Atari 100k sample efficiency benchmark with a median human-normalized score of 0.790 and beats humans in 9 out of 26 games.
Researcher Affiliation	Collaboration	1 Ecole de Technologie Sup erieure, Canada 2Mila 3Roblox, USA 4University of Calgary, Canada 5Canada CIFAR AI Chair. Correspondence to: Pranav Agarwal <pranav.agarwal.1@ens.etsmtl.ca>.
Pseudocode	Yes	Algorithm 1 Integrating DART with lookahead search methods; Algorithm 2 Modeling DART for continuous action space.
Open Source Code	Yes	We release our code at https://pranaval.github.io/DART/.
Open Datasets	Yes	We evaluated our model alongside existing baselines using the Atari 100k benchmark (Łukasz Kaiser et al., 2020), a commonly used testbed for assessing the sample-efficiency of RL algorithms. It consists of 26 games from the Arcade Learning Environment (Bellemare et al., 2013), each with distinct settings requiring perception, planning, and control skills.
Dataset Splits	No	The paper mentions the Atari 100k benchmark but does not explicitly provide specific train/validation/test dataset splits (e.g., percentages or sample counts) for its experiments.
Hardware Specification	No	We are also thankful to the Digital Research Alliance of Canada for the computing resources and CIFAR for research funding.
Software Dependencies	No	The paper provides detailed hyperparameters for its models (Tables 6, 7, 8) but does not specify any software dependencies (e.g., libraries, frameworks) with version numbers.
Experiment Setup	Yes	The world model is trained with a GPT-style causal (decoder) transformer, while the policy is trained using a Vi T-style (encoder) transformer. A detailed list of hyperparameters is provided for each module: Table 6 for Image Tokenizer, Table 7 for World Modeling, and Table 8 for behaviour learning. These tables list specific values like "Embedding dimension 512", "Batch size 64", "Learning rate 0.0001", "Transformer layers 6", etc.