Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

HoloScene: Simulation‑Ready Interactive 3D Worlds from a Single Video

Authors: Hongchi Xia, Chih-Hao Lin, Hao-Yu Hsu, Quentin Leboutet, Katelyn Gao, Michael Paulitsch, Benjamin Ummenhofer, Shenlong Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluations conducted on multiple benchmark datasets demonstrate superior performance, while practical use-cases in interactive gaming and real-time digital-twin manipulation illustrate Holo Scene s broad applicability and effectiveness. ... Experiments on three challenging benchmarks demonstrate superior geometry accuracy and physical plausibility, with rendering performance comparable to state-of-the-art amodal and physics-aware reconstruction methods. ... 4 Experiments Dataset: We conduct the experiments across multiple datasets: 3 scenes from Replica [64], 3 scenes from Scannet++ [89], 2 scenes from i Gibson [26], and one self-captured scene. Metrics: We evaluate geometry quality with Chamfer Distance (CD), F-Score (F1), and Normal Consistency (NC) [91], and assess rendering quality using PSNR, SSIM, and LPIPS. ... Baselines: We evaluate our framework against SOTA approaches in instance-aware amodal 3D scene reconstruction.
Researcher Affiliation Collaboration Hongchi Xia1 Chih-Hao Lin1 Hao-Yu Hsu1 Quentin Leboutet2 Katelyn Gao2 Michael Paulitsch2 Benjamin Ummenhofer2 Shenlong Wang1 1University of Illinois Urbana-Champaign 2Intel
Pseudocode No The paper describes the three-stage inference process (Gradient-based Optimization, Sampling-based Optimization, Gradient-based Refinement) in prose within Section 3.3 and summarized in Figure 2, but does not present a formal pseudocode or algorithm block with numbered steps or a dedicated 'Algorithm' section.
Open Source Code No Project page: here. ... We would open-source the data and code of the paper.
Open Datasets Yes Dataset: We conduct the experiments across multiple datasets: 3 scenes from Replica [64], 3 scenes from Scannet++ [89], 2 scenes from i Gibson [26], and one self-captured scene. ... The Replica dataset: A digital replica of indoor spaces. ar Xiv preprint ar Xiv:1906.05797 (2019) ... Scannet++: A high-fidelity dataset of 3D indoor scenes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 12 22 (2023) ... igibson 2.0: Object-centric simulation for robot learning of everyday household tasks. ar Xiv preprint ar Xiv:2108.03272 (2021)
Dataset Splits No The paper states: 'We conduct the experiments across multiple datasets: 3 scenes from Replica [64], 3 scenes from Scannet++ [89], 2 scenes from i Gibson [26], and one self-captured scene.' and 'We adopt three datasets as we mentioned in our main paper: Replica [64], Scannet++ [89], and i Gibson [26].' While specific scenes are mentioned, the paper does not specify how these datasets are partitioned into training, validation, or test sets in terms of percentages, sample counts, or references to predefined splits for the model training and evaluation.
Hardware Specification Yes The optimization takes approximately 4 hours, 4 hours, and 20 minutes for stages 1, 2, and 3, respectively, on a single A6000 GPU.
Software Dependencies No We utilize Marigold [14] for monocular depth and normal estimation. ... We adopt the Isaac Sim [47] as the physical simulator, ... We adopt their open-source codes and adapt them for the testing benchmarks. ... With our reconstructed environment, we can create a real-time interactive game with Unreal Engine [12]. ... we first inpaint occluded regions using La Ma [66] before generating these views. ... we prompt Wonder3D s multi-diffusion model with real-world observations and generate virtual views Ii from various viewpoints. ... We recover the camera pose with VGGT [69], adjust the predicted depth [54] to align with the virtual scene, and adopt Foundation Pose [73] for object tracking with our reconstructed 3D object for model-based 6D pose estimation. ... We adopt visual effects from Auto VFX [18] to overlay virtual content and shadows onto the image. ... We use Uni Depth V2 [54] to predict the depth map for each frame and compute a scale ratio between the predicted monocular depth and depth rendered from the 3D Gaussians. ... For object tracking, we adopt Foundation Pose [73], using our high-quality reconstructed 3D asset as a CAD model for model-based 6D pose estimation.
Experiment Setup Yes Implementation details: Our inference pipeline consists of three stages. Stage 1 employs gradientbased optimization for 100k steps with loss weights λrgb = 1.0, λmask = 0.5, λdepth = 0.5, and λnormal = 0.1. Stage 2 uses sampling-based optimization with λrgb = 2.0, λmask = 0.5, λdepth = 10.0, λnormal = 10.0, λpene = 5.0, generating three samples per instance. Stage 3 refines Gaussians via gradient-based optimization with λt1 = 0.95 and λssim = 0.05. The optimization takes approximately 4 hours, 4 hours, and 20 minutes for stages 1, 2, and 3, respectively, on a single A6000 GPU. We utilize Marigold [14] for monocular depth and normal estimation.