Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning

Authors: Michael Matthews, Michael Beukman, Benjamin Ellis, Mikayel Samvelyan, Matthew Thomas Jackson, Samuel Coward, Jakob Nicolaus Foerster

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We first present Craftax-Classic: a ground-up rewrite of Crafter in JAX that runs up to 250x faster than the Python-native original. A run of PPO using 1 billion environment interactions finishes in under an hour using only a single GPU and averages 90% of the optimal reward. To provide a more compelling challenge we present the main Craftax benchmark, a significant extension of the Crafter mechanics with elements inspired from Net Hack1. Solving Craftax requires deep exploration, long term planning and memory, as well as continual adaptation to novel situations as more of the world is discovered. We show that existing methods including global and episodic exploration, as well as unsupervised environment design fail to make material progress on the benchmark.
Researcher Affiliation Academia 1University of Oxford 2University College London. Correspondence to: Michael Matthews <michael.matthews@eng.ox.ac.uk>.
Pseudocode No The paper describes the environment mechanics and experimental setups in natural language and through figures, but it does not include any pseudocode or algorithm blocks.
Open Source Code Yes Code provided at https://github.com/Michael TMatthews/Craftax.
Open Datasets Yes We present the Craftax benchmark: a JAX-based environment exhibiting complex, open-ended dynamics and running orders of magnitude faster than comparable environments. Concretely, we first propose Craftax-Classic, a reimplementation of Crafter (Hafner, 2021) in JAX that runs 250 times faster than the Python-native original.
Dataset Splits No The paper defines two benchmarks, Craftax-1B and Craftax-1M, which specify total environment interactions for evaluation, not distinct training, validation, and test dataset splits in the conventional supervised learning sense. It describes evaluating 'saved checkpoints on a fixed set of 20 evaluation levels' for UED, but this refers to the testing phase rather than a separate validation split used during model training.
Hardware Specification Yes All experiments were run on a single machine with a Ge Force RTX 4090 (24GB of VRAM), i9-13900K (24 cores with 32 threads) and 32GB of RAM.
Software Dependencies No The paper mentions several software components like JAX, Pure Jax RL, stable-baselines3, Gymnax, and Jax UED, along with citations to their respective papers. However, it does not provide specific version numbers for these software dependencies within the text, which is necessary for reproducibility.
Experiment Setup Yes The hyperparameters and considered values for PPO on Craftax-1B are shown in Table 7. ...The hyperparameters for ICM, E3B and RND are shown in Tables 8, 9 and 10 respectively. ...The hyperparameters for Craftax-1M were tuned more thoroughly... The hyperparameters for PPO, ICM and E3B are shown in Tables 11, 12 and 13 respectively. Table 14 contains the UED hyperparameters used.