Transformer-based World Models Are Happy With 100k Interactions

Authors: Jan Robine, Marc Höftmann, Tobias Uelwer, Stefan Harmeling

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our transformer-based world model (TWM) generates meaningful, new experience, which is used to train a policy that outperforms previous model-free and model-based reinforcement learning algorithms on the Atari 100k benchmark. and 3 EXPERIMENTS
Researcher Affiliation Academia Jan Robine, Marc Höftmann, Tobias Uelwer, Stefan Harmeling Department of Computer Science, Technical University of Dortmund, Germany
Pseudocode Yes In Algorithm 1 we present pseudocode for training the world model and the policy.
Open Source Code Yes Our code is available at https://github.com/jrobine/twm.
Open Datasets Yes To compare data-efficient reinforcement learning algorithms, Kaiser et al. (2020) proposed the Atari 100k benchmark, which uses a subset of 26 Atari games from the Arcade Learning Environment (Bellemare et al., 2013) and limits the number of interactions per game to 100K.
Dataset Splits No The paper describes collecting experience into a dataset D and sampling from it for training. It does not explicitly define distinct training, validation, and test dataset splits with percentages or counts for reproducibility of the data partitioning.
Hardware Specification Yes For each run, we give the agent a total training and evaluation budget of roughly 10 hours on a single NVIDIA A100 GPU. The time can vary slightly, since the budget is based on the number of updates. An NVIDIA Ge Force RTX 3090 requires 12-13 hours for the same amount of training and evaluation. When using a vanilla transformer, which does not use the memory mechanism of the Transformer-XL architecture (Dai et al., 2019), the runtime is roughly 15.5 hours on an NVIDIA A100 GPU, i.e., 1.5 times higher. The throughputs were measured on an NVIDIA A100 GPU and are given in (approximate) samples per second: [...] All runtimes are measured on a single NVIDIA P100 GPU.
Software Dependencies No The paper refers to various architectural components and methods (e.g., Transformer-XL, SiLU activation function) and cites related work (e.g., Dreamer V2), but it does not explicitly list specific software dependencies with their version numbers (e.g., 'Python 3.8, PyTorch 1.9, and CUDA 11.1').
Experiment Setup Yes In Table 4 we summarize all hyperparameters that we used in our experiments. and then lists specific values for various parameters like Dataset sampling temperature τ 20, Discount factor γ 0.99, World model batch size N 100, Imagination horizon H 15, Observation learning rate 0.0001, etc.