Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Building 3D Representations and Generating Motions From a Single Image via Video-Generation

Authors: Weiming Zhi, Ziyong Ma, Tianyi Zhang, Matthew Johnson-Roberson

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extensively evaluate VGER over a diverse set of indoor and outdoor environments. We demonstrate its ability to produce smooth motions that account for the captured geometry of a scene, all from a single RGB input image.
Researcher Affiliation Collaboration Weiming Zhi 1,2,3 Ziyong Ma 3 Tianyi Zhang 3,4 Matthew Johnson-Roberson 2,3 1 School of Computer Science, The University of Sydney, Australia. 2 College of Connected Computing, Vanderbilt University, TN, USA. 3 Robotics Institute, Carnegie Mellon University, PA, USA. 4 Aurora, USA. Correspondence to W. Zhi (EMAIL).
Pseudocode No The paper describes the VGER method pipeline in Section 3 and provides an overview in Figure 2. However, it does not include a dedicated section or figure labeled as 'Pseudocode' or 'Algorithm' with structured, code-like steps for any part of the methodology. The description is in narrative text form.
Open Source Code No Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: Though code has not been released during the review, the authors commit to releasing code with a camera-ready version of this manuscript.
Open Datasets No To evaluate the robustness and quality of our proposed Video-generation Environment Representation (VGER) along with the produced motion trajectories, we collect sets of 10 images in both outdoor and indoor scenes. These include the complex multi-object scenes Garden, Bench, Table, and Indoor, along with the more object-centric Stone and Cabinet scenes.
Dataset Splits No To evaluate the robustness and quality of our proposed Video-generation Environment Representation (VGER) along with the produced motion trajectories, we collect sets of 10 images in both outdoor and indoor scenes... For the evaluation of the quality of the performance of VGER and benchmark models, we extract the structure from a single image. We use the entire set of ten images, passed to DUSt3R, to construct a representation that we then consider to be the ground truth.
Hardware Specification Yes We run our experiments on a standard desktop with an Intel i9 CPU and an NVIDIA RTX 4090 GPU with 24GB VRAM.
Software Dependencies No We use all the standard hyper-parameters in the Seva and DUSt3R foundation models used in our pipeline. Here list all the hyper-parameters used for the reconstruction and motion policies construction of our VGER method in table A1. Seva is a 1.3 billion parameter diffusion model, using a stable diffusion 2.1 backbone [28]. It produces a sequence of output images {I1, I2, . . . , In}, conditional on the input image I0 and a trajectory of camera poses {T1, . . . , Tn}, where n is a pre-determined image sequence length. The implicit model fθ is represented by a neural network, that incorporates SIREN activation layers [30].
Experiment Setup Yes A1 Implementation Details... Here list all the hyper-parameters used for the reconstruction and motion policies construction of our VGER method in table A1. Implicit Model Multi-Scale Sampling: αsurf 0.5, αeik 0.1, σmin 0.0025, σmax 0.1 Network for Implicit Model: Layers 3, Width 256, ω0 25, Training Epochs per LR 2000, |P| 10000, Learning rates (LRs) 3e-4, 1e-4, 5e-5, 1e-5, |Bs| 5000 Motion Policy Blow-up: k 20, β 100, ϵ 10 8