Improving Token-Based World Models with Parallel Observation Prediction

Authors: Lior Cohen, Kaixin Wang, Bingyi Kang, Shie Mannor

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate REM on the Atari 100K benchmark, demonstrating the effectiveness of POP. POP leads to a 15.4x speed-up at imagination and trains in under 12 hours, while outperforming prior TBWMs. 3. Experiments
Researcher Affiliation Collaboration 1Technion Israel Institute of Technology 2Byte Dance.
Pseudocode Yes A pseudo-code algorithm of REM is presented in Appendix A.2. A pseudo code of our novel POP extension of Ret Net is given in Algorithms 1 and 2.
Open Source Code Yes Our code is available at https://github.com/leor-c/REM.
Open Datasets Yes We follow most prior works on world models and evaluate REM on the widely-recognized Atari 100K benchmark (Kaiser et al., 2020) for sample-efficient reinforcement learning.
Dataset Splits No The paper evaluates on the Atari 100K benchmark with a fixed interaction limit of 100K steps and uses 100 episodes for evaluation, but it does not specify explicit training/validation/test dataset splits as percentages or sample counts for the data used to train the world model or controller.
Hardware Specification Yes For benchmarking agents run times, we used a workstation with an Nvidia RTX 4090 GPU. The rest of our experiments were conducted on Nvidia V100 GPUs.
Software Dependencies No The paper mentions software components like 'Yet-Another-Ret Net Ret Net implementation' and 'Py Torch', but does not specify their version numbers or the versions of any other key libraries or frameworks used.
Experiment Setup Yes Tables 2 and 3 detail hyperparameters of the optimization and environment, as well as hyperparameters shared by multiple components. The hyperparameters of M are presented in Table 5. The values of the hyperparameters used in our experiments are detailed in Table 7.