Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Improving Token-Based World Models with Parallel Observation Prediction

Authors: Lior Cohen, Kaixin Wang, Bingyi Kang, Shie Mannor

ICML 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate REM on the Atari 100K benchmark, demonstrating the effectiveness of POP. POP leads to a 15.4x speed-up at imagination and trains in under 12 hours, while outperforming prior TBWMs. 3. Experiments
Researcher Affiliation Collaboration 1Technion Israel Institute of Technology 2Byte Dance.
Pseudocode Yes A pseudo-code algorithm of REM is presented in Appendix A.2. A pseudo code of our novel POP extension of Ret Net is given in Algorithms 1 and 2.
Open Source Code Yes Our code is available at https://github.com/leor-c/REM.
Open Datasets Yes We follow most prior works on world models and evaluate REM on the widely-recognized Atari 100K benchmark (Kaiser et al., 2020) for sample-efficient reinforcement learning.
Dataset Splits No The paper evaluates on the Atari 100K benchmark with a fixed interaction limit of 100K steps and uses 100 episodes for evaluation, but it does not specify explicit training/validation/test dataset splits as percentages or sample counts for the data used to train the world model or controller.
Hardware Specification Yes For benchmarking agents run times, we used a workstation with an Nvidia RTX 4090 GPU. The rest of our experiments were conducted on Nvidia V100 GPUs.
Software Dependencies No The paper mentions software components like 'Yet-Another-Ret Net Ret Net implementation' and 'Py Torch', but does not specify their version numbers or the versions of any other key libraries or frameworks used.
Experiment Setup Yes Tables 2 and 3 detail hyperparameters of the optimization and environment, as well as hyperparameters shared by multiple components. The hyperparameters of M are presented in Table 5. The values of the hyperparameters used in our experiments are detailed in Table 7.