reproducibilityindex.ai

Transformers are Sample-Efficient World Models

Authors: Vincent Micheli, Eloi Alonso, François Fleuret

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	With the equivalent of only two hours of gameplay in the Atari 100k benchmark, IRIS achieves a mean human normalized score of 1.046, and outperforms humans on 10 out of 26 games, setting a new state of the art for methods without lookahead search.
Researcher Affiliation	Academia	Vincent Micheli University of Geneva Eloi Alonso University of Geneva François Fleuret University of Geneva
Pseudocode	Yes	Algorithm 1 summarizes the training protocol.
Open Source Code	Yes	To foster future research on Transformers and world models for sample-efficient reinforcement learning, we release our code and models at https://github.com/eloialonso/iris.
Open Datasets	Yes	In this work, we focus on the well established Atari 100k benchmark (Kaiser et al., 2020).
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, and test sets in the traditional supervised learning sense. For the RL benchmark, it describes the evaluation protocol but not fixed data splits.
Hardware Specification	Yes	We ran our experiments with 8 Nvidia A100 40GB GPUs.
Software Dependencies	No	The paper mentions 'Minimal dependencies are required to run the codebase' but does not provide specific software names with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup	Yes	We describe model architectures and list hyperparameters in Appendix A. ... Table 2: Encoder / Decoder hyperparameters. ... Table 3: Embedding table hyperparameters. ... Table 4: Transformer hyperparameters. ... Table 5: Training loop & Shared hyperparameters. ... Table 6: RL training hyperparameters.