Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Improving Token-Based World Models with Parallel Observation Prediction
Authors: Lior Cohen, Kaixin Wang, Bingyi Kang, Shie Mannor
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate REM on the Atari 100K benchmark, demonstrating the effectiveness of POP. POP leads to a 15.4x speed-up at imagination and trains in under 12 hours, while outperforming prior TBWMs. 3. Experiments |
| Researcher Affiliation | Collaboration | 1Technion Israel Institute of Technology 2Byte Dance. |
| Pseudocode | Yes | A pseudo-code algorithm of REM is presented in Appendix A.2. A pseudo code of our novel POP extension of Ret Net is given in Algorithms 1 and 2. |
| Open Source Code | Yes | Our code is available at https://github.com/leor-c/REM. |
| Open Datasets | Yes | We follow most prior works on world models and evaluate REM on the widely-recognized Atari 100K benchmark (Kaiser et al., 2020) for sample-efficient reinforcement learning. |
| Dataset Splits | No | The paper evaluates on the Atari 100K benchmark with a fixed interaction limit of 100K steps and uses 100 episodes for evaluation, but it does not specify explicit training/validation/test dataset splits as percentages or sample counts for the data used to train the world model or controller. |
| Hardware Specification | Yes | For benchmarking agents run times, we used a workstation with an Nvidia RTX 4090 GPU. The rest of our experiments were conducted on Nvidia V100 GPUs. |
| Software Dependencies | No | The paper mentions software components like 'Yet-Another-Ret Net Ret Net implementation' and 'Py Torch', but does not specify their version numbers or the versions of any other key libraries or frameworks used. |
| Experiment Setup | Yes | Tables 2 and 3 detail hyperparameters of the optimization and environment, as well as hyperparameters shared by multiple components. The hyperparameters of M are presented in Table 5. The values of the hyperparameters used in our experiments are detailed in Table 7. |