Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Bootstrap Off-policy with World Model
Authors: Guojian Zhan, Likun Wang, Xiangteng Zhang, Jiaxin Gao, Masayoshi TOMIZUKA, Shengbo Eben Li
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the high-dimensional Deep Mind Control Suite and Humanoid-Bench show that BOOM achieves state-of-the-art results in both training stability and final performance. The paper includes a dedicated '4 Experiments' section with '4.1 Experimental Setup', '4.2 Experimental Results', and '4.3 Ablation Study', featuring performance curves in Figure 2 and numerical results in Table 1. |
| Researcher Affiliation | Academia | Guojian Zhan1,2, Likun Wang1, Xiangteng Zhang1, Jiaxin Gao1, Masayoshi Tomizuka2, Shengbo Eben Li1 1 College of AI & School of Vehicle and Mobility, Tsinghua University 2 Berkeley AI Research (BAIR), UC Berkeley. The email 'EMAIL' also confirms academic affiliation. |
| Pseudocode | Yes | The paper contains a clearly labeled algorithm block: 'Algorithm 1 BOOM: Bootstrap Off-policy with World Model' on page 5. |
| Open Source Code | Yes | The code is accessible at https://github.com/molumitu/BOOM_MBRL. |
| Open Datasets | Yes | We evaluate our method on a challenging benchmark of 14 high-dimensional locomotion tasks drawn from the Deep Mind Control Suite (DMC) [41] and the recently proposed Humanoid-Bench (H-Bench) [38]. |
| Dataset Splits | No | The paper describes reinforcement learning environments (Deep Mind Control Suite and Humanoid-Bench tasks) rather than static datasets with explicit train/test/validation splits. Data is generated through interaction with these environments, and performance is evaluated periodically on the tasks themselves, not on predefined splits of a static dataset. |
| Hardware Specification | Yes | The CPU used is the AMD Ryzen Threadripper 3960X 24-Core Processor, and the GPU used is NVIDIA Ge Force RTX 3090Ti. |
| Software Dependencies | No | The paper states, 'We base all our experiments on the released official TD-MPC2 codebase https://github.com/ nicklashansen/tdmpc.' and mentions other official repositories for baselines, but it does not specify explicit version numbers for software components like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | The paper includes 'Table 2: Hyperparameter settings.' in Appendix B.2, which details numerous specific hyperparameter values such as 'Learning rate 3 x 10^-4', 'Target network update rate 0.5', 'Discount factor (γ) 0.99', 'Replay batch size 256', 'MPPI Iterations 6', 'Horizon 3', and 'Entropy coefficient (α) 1 x 10^-4'. |