Improving Token-Based World Models with Parallel Observation Prediction
Authors: Lior Cohen, Kaixin Wang, Bingyi Kang, Shie Mannor
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate REM on the Atari 100K benchmark, demonstrating the effectiveness of POP. POP leads to a 15.4x speed-up at imagination and trains in under 12 hours, while outperforming prior TBWMs. 3. Experiments |
| Researcher Affiliation | Collaboration | 1Technion Israel Institute of Technology 2Byte Dance. |
| Pseudocode | Yes | A pseudo-code algorithm of REM is presented in Appendix A.2. A pseudo code of our novel POP extension of Ret Net is given in Algorithms 1 and 2. |
| Open Source Code | Yes | Our code is available at https://github.com/leor-c/REM. |
| Open Datasets | Yes | We follow most prior works on world models and evaluate REM on the widely-recognized Atari 100K benchmark (Kaiser et al., 2020) for sample-efficient reinforcement learning. |
| Dataset Splits | No | The paper evaluates on the Atari 100K benchmark with a fixed interaction limit of 100K steps and uses 100 episodes for evaluation, but it does not specify explicit training/validation/test dataset splits as percentages or sample counts for the data used to train the world model or controller. |
| Hardware Specification | Yes | For benchmarking agents run times, we used a workstation with an Nvidia RTX 4090 GPU. The rest of our experiments were conducted on Nvidia V100 GPUs. |
| Software Dependencies | No | The paper mentions software components like 'Yet-Another-Ret Net Ret Net implementation' and 'Py Torch', but does not specify their version numbers or the versions of any other key libraries or frameworks used. |
| Experiment Setup | Yes | Tables 2 and 3 detail hyperparameters of the optimization and environment, as well as hyperparameters shared by multiple components. The hyperparameters of M are presented in Table 5. The values of the hyperparameters used in our experiments are detailed in Table 7. |