Transformers are Sample-Efficient World Models
Authors: Vincent Micheli, Eloi Alonso, François Fleuret
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | With the equivalent of only two hours of gameplay in the Atari 100k benchmark, IRIS achieves a mean human normalized score of 1.046, and outperforms humans on 10 out of 26 games, setting a new state of the art for methods without lookahead search. |
| Researcher Affiliation | Academia | Vincent Micheli University of Geneva Eloi Alonso University of Geneva François Fleuret University of Geneva |
| Pseudocode | Yes | Algorithm 1 summarizes the training protocol. |
| Open Source Code | Yes | To foster future research on Transformers and world models for sample-efficient reinforcement learning, we release our code and models at https://github.com/eloialonso/iris. |
| Open Datasets | Yes | In this work, we focus on the well established Atari 100k benchmark (Kaiser et al., 2020). |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, and test sets in the traditional supervised learning sense. For the RL benchmark, it describes the evaluation protocol but not fixed data splits. |
| Hardware Specification | Yes | We ran our experiments with 8 Nvidia A100 40GB GPUs. |
| Software Dependencies | No | The paper mentions 'Minimal dependencies are required to run the codebase' but does not provide specific software names with version numbers (e.g., Python 3.8, PyTorch 1.9). |
| Experiment Setup | Yes | We describe model architectures and list hyperparameters in Appendix A. ... Table 2: Encoder / Decoder hyperparameters. ... Table 3: Embedding table hyperparameters. ... Table 4: Transformer hyperparameters. ... Table 5: Training loop & Shared hyperparameters. ... Table 6: RL training hyperparameters. |