Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
RoboScape: Physics-informed Embodied World Model
Authors: Yu Shang, Xin Zhang, Yinzhou Tang, Lei Jin, Chen Gao, Wei Wu, Yong Li
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that Robo Scape generates videos with superior visual fidelity and physical plausibility across diverse robotic scenarios. We further validate its practical utility through downstream applications including robotic policy training with generated data and policy evaluation. We conduct comprehensive experiments to evaluate our world model from three aspects: video generation quality, robotic policy learning using synthetic data, and robotic policy evaluation. |
| Researcher Affiliation | Collaboration | Yu Shang1, Xin Zhang2, Yinzhou Tang1, Lei Jin1, Chen Gao1, Wei Wu2 , Yong Li1 1Tsinghua University 2Manifold AI |
| Pseudocode | No | The paper describes the methodology using text and diagrams (e.g., Figure 2: 'Overview of the physics-informed world model'), but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code and demos are available at: https://github.com/tsinghua-fib-lab/Robo Scape. |
| Open Datasets | Yes | In our experiment, we use 50,000 videos extracted from the Agi Bot World-Beta dataset [40], covering 147 tasks and 72 skills. ... We further validated our approach using the π0 [32] model on the challenging LIBERO [44] task suite. ... In the experiments on the Robomimic Lift task [43]... |
| Dataset Splits | Yes | Our dataset comprises approximately 6.5M training clips and 1.2K test clips. |
| Hardware Specification | Yes | Training completes in approximately 24 hours on a cluster of 32 NVIDIA A800-SXM4-80GB GPUs. |
| Software Dependencies | No | The paper mentions several tools and models like MAGVIT-2, Video Depth Anything, Spatial Tracker, Trans Net V2, Intern-VL, Flow Net, Diffusion Policy (DP), and π0. However, it does not specify version numbers for these software components or any programming languages/libraries (e.g., Python, PyTorch) used. |
| Experiment Setup | Yes | We preprocess videos by extracting 16-frame clips sampled at 2Hz, yielding approximately 6.5 million training clips. The model is trained for 5 epochs using the following hyperparameters: λ1 = 1, λ2 = 0.01, λ3 = 1, and γ = 5. During inference, we use the first frame as a conditional input to autoregressively predict the subsequent 15 frames. |