Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Bootstrap Off-policy with World Model

Authors: Guojian Zhan, Likun Wang, Xiangteng Zhang, Jiaxin Gao, Masayoshi TOMIZUKA, Shengbo Eben Li

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on the high-dimensional Deep Mind Control Suite and Humanoid-Bench show that BOOM achieves state-of-the-art results in both training stability and final performance. The paper includes a dedicated '4 Experiments' section with '4.1 Experimental Setup', '4.2 Experimental Results', and '4.3 Ablation Study', featuring performance curves in Figure 2 and numerical results in Table 1.
Researcher Affiliation	Academia	Guojian Zhan1,2, Likun Wang1, Xiangteng Zhang1, Jiaxin Gao1, Masayoshi Tomizuka2, Shengbo Eben Li1 1 College of AI & School of Vehicle and Mobility, Tsinghua University 2 Berkeley AI Research (BAIR), UC Berkeley. The email 'EMAIL' also confirms academic affiliation.
Pseudocode	Yes	The paper contains a clearly labeled algorithm block: 'Algorithm 1 BOOM: Bootstrap Off-policy with World Model' on page 5.
Open Source Code	Yes	The code is accessible at https://github.com/molumitu/BOOM_MBRL.
Open Datasets	Yes	We evaluate our method on a challenging benchmark of 14 high-dimensional locomotion tasks drawn from the Deep Mind Control Suite (DMC) [41] and the recently proposed Humanoid-Bench (H-Bench) [38].
Dataset Splits	No	The paper describes reinforcement learning environments (Deep Mind Control Suite and Humanoid-Bench tasks) rather than static datasets with explicit train/test/validation splits. Data is generated through interaction with these environments, and performance is evaluated periodically on the tasks themselves, not on predefined splits of a static dataset.
Hardware Specification	Yes	The CPU used is the AMD Ryzen Threadripper 3960X 24-Core Processor, and the GPU used is NVIDIA Ge Force RTX 3090Ti.
Software Dependencies	No	The paper states, 'We base all our experiments on the released official TD-MPC2 codebase https://github.com/ nicklashansen/tdmpc.' and mentions other official repositories for baselines, but it does not specify explicit version numbers for software components like Python, PyTorch, or CUDA.
Experiment Setup	Yes	The paper includes 'Table 2: Hyperparameter settings.' in Appendix B.2, which details numerous specific hyperparameter values such as 'Learning rate 3 x 10^-4', 'Target network update rate 0.5', 'Discount factor (γ) 0.99', 'Replay batch size 256', 'MPPI Iterations 6', 'Horizon 3', and 'Entropy coefficient (α) 1 x 10^-4'.