Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos

Authors: Chieh Lin, Zhaoyang Lv, Songyin Wu, Zhen Xu, Thu H Nguyen-Phuoc, Hung-Yu Tseng, Julian Straub, Numair Khan, Lei Xiao, Ming-Hsuan Yang, Yuheng Ren, Richard Newcombe, Zhao Dong, Zhengqin Li

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Qualitative and quantitative experiments demonstrate that DGS-LRM achieves dynamic scene reconstruction quality comparable to optimization-based methods, while significantly outperforming the previous predictive dynamic reconstruction method on real-world examples. Its predicted physically grounded 3D deformation is accurate and can be readily adapted for long-range 3D tracking tasks, achieving performance on par with state-of-the-art monocular video 3D tracking methods. We evaluate DGS-LRM on Dy Check [25] and DAVIS [8]. In Table 1 and Figure 4, we show that our DGS-LRM outperforms the baseline predictive method, L4GM [67], while performing comparably to the state-of-the-art optimization-based reconstruction methods. In Table 2 and Figure 6, we evaluate the quality of the reconstructed scene flow on the Point Odyssey benchmark [107]. In Table 3, we show that each proposed components contribute to the final performance. The ablation is conducted at the 256 256 resolution training stage, and we compare the rendering quality on the Dy Check benchmark.
Researcher Affiliation Collaboration Chieh Hubert Lin1,2 Zhaoyang Lv1 Songyin Wu1,3 Zhen Xu1 Thu Nguyen-Phuoc1 Hung-Yu Tseng1 Julian Straub1 Numair Khan1 Lei Xiao1 Ming-Hsuan Yang1,2 Yuheng Ren1 Richard Newcombe1 Zhao Dong1 Zhengqin Li1 1Meta 2UC Merced 3UC Santa Barbara
Pseudocode No The paper describes the methodology in prose and presents figures to illustrate concepts (Figure 2: DGS-LRM overview, Figure 3: A visualization of pixelaligned deformable 3D Gaussians), but it does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: The code release requires additional review. We will put our best effort into releasing the codes and the pretrained model.
Open Datasets Yes We create a customized large-scale dataset using Kubric [26], featuring multi-view renderings paired with per-pixel 3D scene flow. We evaluate DGS-LRM on Dy Check [25] and DAVIS [8]. In Table 2 and Figure 6, we evaluate the quality of the reconstructed scene flow on the Point Odyssey benchmark [107].
Dataset Splits No We render the Kubric dataset according to these two setups and create 40,000 scenes (each with 4 synchronized cameras) for both resolutions. We use the i Phone subset of Dy Check, which includes two synchronized novel-view cameras for reconstruction metrics evaluation. The i Phone subset contains 7 long monocular videos, 200-400 frames each. In addition, Dy Check also labels the covisibility between training and novel-view cameras and evaluates the masked version of reconstruction metrics. Point Odyssey includes 13 videos (ranging from 1,000 frames to 4,000 frames) of synthetic scenes with humanoid and animal meshes articulated with transferred real-world motions.
Hardware Specification Yes We train our method with 64 H100 GPUs with 80GB VRAM.
Software Dependencies No Similar to GS-LRM, we apply the common practice to save GPU VRAMs using x Formers [41], deferred backpropagation [101], gradient checkpointing [12], and BF16 mixed-precision training [36].
Experiment Setup Yes For all variants of DGS-LRM, we use N = 24 input frames with temporal sampling rate l = 4, which results in 6 keyframes after temporal tokenization. We use K = 4 for reference views and set the number of output views per scene to Q = 8. For training efficiency, we first train the model at 256 256 resolution and then fine-tune it at 512 512 resolution. ... For the first stage of training, we use a batch size of 15 per GPU, train for 40k iterations with a learning rate of 4e 4, and then decay to 1e 6 with a cosine learning rate scheduler. For the second stage, we use a batch size of 8 per GPU, train for 20k iterations with a learning rate of 1e 4, and then decay to 1e 6 with a cosine learning rate scheduler. For both stages, we use a learning rate warm-up for 500 iterations, which linearly ramps up the learning rate from 0 to the initial learning rate.