Transformer-based World Models Are Happy With 100k Interactions
Authors: Jan Robine, Marc Höftmann, Tobias Uelwer, Stefan Harmeling
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our transformer-based world model (TWM) generates meaningful, new experience, which is used to train a policy that outperforms previous model-free and model-based reinforcement learning algorithms on the Atari 100k benchmark. and 3 EXPERIMENTS |
| Researcher Affiliation | Academia | Jan Robine, Marc Höftmann, Tobias Uelwer, Stefan Harmeling Department of Computer Science, Technical University of Dortmund, Germany |
| Pseudocode | Yes | In Algorithm 1 we present pseudocode for training the world model and the policy. |
| Open Source Code | Yes | Our code is available at https://github.com/jrobine/twm. |
| Open Datasets | Yes | To compare data-efficient reinforcement learning algorithms, Kaiser et al. (2020) proposed the Atari 100k benchmark, which uses a subset of 26 Atari games from the Arcade Learning Environment (Bellemare et al., 2013) and limits the number of interactions per game to 100K. |
| Dataset Splits | No | The paper describes collecting experience into a dataset D and sampling from it for training. It does not explicitly define distinct training, validation, and test dataset splits with percentages or counts for reproducibility of the data partitioning. |
| Hardware Specification | Yes | For each run, we give the agent a total training and evaluation budget of roughly 10 hours on a single NVIDIA A100 GPU. The time can vary slightly, since the budget is based on the number of updates. An NVIDIA Ge Force RTX 3090 requires 12-13 hours for the same amount of training and evaluation. When using a vanilla transformer, which does not use the memory mechanism of the Transformer-XL architecture (Dai et al., 2019), the runtime is roughly 15.5 hours on an NVIDIA A100 GPU, i.e., 1.5 times higher. The throughputs were measured on an NVIDIA A100 GPU and are given in (approximate) samples per second: [...] All runtimes are measured on a single NVIDIA P100 GPU. |
| Software Dependencies | No | The paper refers to various architectural components and methods (e.g., Transformer-XL, SiLU activation function) and cites related work (e.g., Dreamer V2), but it does not explicitly list specific software dependencies with their version numbers (e.g., 'Python 3.8, PyTorch 1.9, and CUDA 11.1'). |
| Experiment Setup | Yes | In Table 4 we summarize all hyperparameters that we used in our experiments. and then lists specific values for various parameters like Dataset sampling temperature τ 20, Discount factor γ 0.99, World model batch size N 100, Imagination horizon H 15, Observation learning rate 0.0001, etc. |