Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Transformer-based World Models with Contrastive Predictive Coding
Authors: Maxime Burchi, Radu Timofte
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | TWISTER achieves a human-normalized mean score of 162% on the Atari 100k benchmark, setting a new record among state-of-the-art methods that do not employ look-ahead search. We release our code at https://github.com/burchim/TWISTER. In this section, we describe our experiments on the commonly used Atari 100k benchmark. We compare TWISTER with Sim PLe, Dreamer V3 and recent Transformer model-based approaches in Table 2. We also perform several ablation studies on the principal components of TWISTER. |
| Researcher Affiliation | Academia | Maxime Burchi, Radu Timofte Computer Vision Lab, CAIDAS & IFI, University of W urzburg, Germany EMAIL |
| Pseudocode | No | The paper describes the architecture and optimization process of the proposed Transformer-based world model with contrastive representations using equations and textual descriptions, but does not include a dedicated pseudocode or algorithm block. |
| Open Source Code | Yes | TWISTER achieves a human-normalized mean score of 162% on the Atari 100k benchmark, setting a new record among state-of-the-art methods that do not employ look-ahead search. We release our code at https://github.com/burchim/TWISTER. |
| Open Datasets | Yes | TWISTER achieves a human-normalized mean score of 162% on the Atari 100k benchmark... The Atari 100k benchmark was proposed in Kaiser et al. (2020) to evaluate reinforcement learning agents on Atari games in low data regime. |
| Dataset Splits | No | The Atari 100k benchmark was proposed in Kaiser et al. (2020) to evaluate reinforcement learning agents on Atari games in low data regime. The benchmark includes 26 Atari games with a budget of 400k environment frames, amounting to 100k interactions between the agent and the environment using the default action repeat setting. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | Table 9: TWISTER hyper-parameters. We apply the same hyper-parameters to all Atari games. Parameter Symbol Setting General Batch Size B 16 Sequence Length T 64 Optimizer Adam (Kingma & Ba, 2014) Image Resolution 64 64 (RGB) Training Step per Policy Step 1 Environment Instances 1 Transformer Network Transformer Blocks N 4 Number of Attention Heads 8 Dropout Probability 0.1 Attention Context Length 8 World Model Stochastic State Features 32 Classes per Feature 32 Dynamics Loss Scale βdyn 0.5 Representation Loss Scale βreg 0.1 AC-CPC Steps K 10 Random Crop & Resize Scale (0.25, 1.0) Random Crop & Resize Ratio (0.75, 1.33) Learning Rate α 10 4 Adam Betas β1, β2 0.9, 0.999 Adam Epsilon ϵ 10 8 Gradient Clipping 1000 Actor Critic Imagination Horizon H 15 Return Discount γ 0.997 Return Lambda λ 0.95 Critic EMA Decay 0.98 Return Normalization Momentum 0.99 Actor Entropy Scale η 3 10 4 Learning Rate α 3 10 5 Adam Betas β1, β2 0.9, 0.999 Adam Epsilon ϵ 10 5 Gradient Clipping 100 |