Transformers are Sample-Efficient World Models

Authors: Vincent Micheli, Eloi Alonso, François Fleuret

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental With the equivalent of only two hours of gameplay in the Atari 100k benchmark, IRIS achieves a mean human normalized score of 1.046, and outperforms humans on 10 out of 26 games, setting a new state of the art for methods without lookahead search.
Researcher Affiliation Academia Vincent Micheli University of Geneva Eloi Alonso University of Geneva François Fleuret University of Geneva
Pseudocode Yes Algorithm 1 summarizes the training protocol.
Open Source Code Yes To foster future research on Transformers and world models for sample-efficient reinforcement learning, we release our code and models at https://github.com/eloialonso/iris.
Open Datasets Yes In this work, we focus on the well established Atari 100k benchmark (Kaiser et al., 2020).
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, and test sets in the traditional supervised learning sense. For the RL benchmark, it describes the evaluation protocol but not fixed data splits.
Hardware Specification Yes We ran our experiments with 8 Nvidia A100 40GB GPUs.
Software Dependencies No The paper mentions 'Minimal dependencies are required to run the codebase' but does not provide specific software names with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup Yes We describe model architectures and list hyperparameters in Appendix A. ... Table 2: Encoder / Decoder hyperparameters. ... Table 3: Embedding table hyperparameters. ... Table 4: Transformer hyperparameters. ... Table 5: Training loop & Shared hyperparameters. ... Table 6: RL training hyperparameters.