Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Instant4D: 4D Gaussian Splatting in Minutes

Authors: Zhanpeng Luo, Haoxi Ran, Li Lu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments 4.1 Training and Inference Detail 4.2 Evaluation on NVIDIA & Dycheck Benchmarks 4.3 Evaluation on in-the-wild video 4.4 Ablation and Analysis
Researcher Affiliation Academia Zhanpeng Luo University of Pittsburgh Zhanpeng EMAIL Haoxi Ran Carnegie Mellon University EMAIL Li Lu Sichuan Univeristy EMAIL
Pseudocode No The paper includes diagrams and descriptions of the pipeline, but no formal pseudocode blocks or algorithms are presented. For example, Figure 2 illustrates the pipeline, and Section 3.3 describes the optimization process in text.
Open Source Code No Our project website is published at https://instant4d.github.io/. Justification: We have listed all the necessary details to reproduce the experiment. We will also release our code if accpeted.
Open Datasets Yes We reconstruct scenes on NVIDIA dataset [43] in average 2 minutes and on Dycheck [3] dataset in average 7.2 minutes. To assess performance on in-the-wild video, we conduct qualitative experiments on DAVIS Dataset [18].
Dataset Splits No For the Dycheck i Phone dataset [3], we followed the evaluation protocol established by Jeong et al [5]. Evaluation on NVIDIA We evaluated INSTANT4D against several baseline methods in the NVIDIA Dynamic data set following the protocol [12]. This dataset consists of seven scenes, each with 12 frames captured from 12 camera viewpoints for training, with testing performed from fixed viewpoints at consecutive timestamps.
Hardware Specification Yes Testing on a single NVIDIA A6000 GPU, our Lite model completes the full training pipeline in 96 seconds with peak memory usage of 988 MB on the shortest sequence (235-frame "paper-windmill"), and 131 seconds with peak memory of 1,147 MB on the longest sequence (379-frame "apple").
Software Dependencies No The paper does not provide specific version numbers for software dependencies such as Python, PyTorch, or CUDA versions. It mentions using "standard 3DGS[6] hyperparameters" and models like Mega SAM [11] and Unidepth [19], but not their implementation versions.
Experiment Setup Yes Implementation Detail On the Dycheck i Phone dataset [3], we followed the evaluation protocol established by Jeong et al [5]. We set the maximum optimization iterations to 5,000 and adopted the standard 3DGS[6] hyperparameters for loss weights and learning rates, with the exception of reducing the position learning rate to 1e 5 and extending the learning rate scheduler to 5,000 steps. Our initialization strategy differed between model variants. For the Lite model, we initialized 4D Gaussians with a voxel size of λs = 4 for static regions and λd = 4 for dynamic regions. In our Full model, we set λs = 1 but omitted the grid pruning step for dynamic regions to preserve detail and alleviate some potential underfit caused without densification. During practice, one can still enable densification for better training view rendering quality. Temporal scaling was set to st = k fps for dynamic regions, while static regions used a constant scale equal to the entire video length (st = lvideo). For the NVIDIA Dynamic Scene dataset [43], we reduced the maximum optimization iterations to 1,500 while maintaining the same hyperparameters as our implementation in Dycheck [3], adjusting only the learning rate scheduler s maximum step to match the shorter optimization cycle. The grid pruning and initialization parameters remained consistent across both datasets.