Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation

Authors: Quankai Gao, Qiangeng Xu, Zhe Cao, Ben Mildenhall, Wenchao Ma, Le Chen, Danhang Tang, Ulrich Neumann

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we first provide implementation details of the proposed method and then validate our method on 4D Gaussian representations with (1) 4D novel view synthesis and (2) 4D generation. We tested on the Plenoptic Video Datasets (Li et al., 2022) and the Consistent4D Dataset (Jiang et al., 2023) for both quantitative and qualitative evaluation. Our method achieves state-of-the-art results in both tasks. ... With the proposed flow supervision, our method shows improved performance on all scenes and the gains are prominent on dynamic regions. Consequently, our 4D novel view synthesis results achieve state-of-the art quality. Both qualitative and quantitative comparisons on the Ne RF-DS dataset in Fig. 10 and Tab. 2 show the effectiveness of the proposed method on scenes with complex camera motions. ... We evaluate and compare our method with Dream Gaussian4D (Ren et al., 2023), a recent 4D Gaussian-based state-of-the-art generative model with open-sourced code, and dynamic Ne RF-based methods D-Ne RF (Pumarola et al., 2021), K-planes (Fridovich-Keil et al., 2023) and Consistent4D (Jiang et al., 2023) in Tab. 3 on Consistent4D dataset.
Researcher Affiliation Collaboration Quankai Gao EMAIL University of Southern California Qiangeng Xu EMAIL Google Zhe Cao EMAIL Google Ben Mildenhall EMAIL Google Wenchao Ma EMAIL Pennsylvania State University Le Chen EMAIL Max Planck Institute for Intelligent Systems Danhang Tang EMAIL Google Ulrich Neumann EMAIL University of Southern California
Pseudocode Yes A detailed pseudo code for our flow supervision can be found in Algorithm 1. We extract the projected Gaussian dynamics and obtain the final Gaussian flow by rendering these dynamics.
Open Source Code No Project page: https://zerg-overmind.github.io/Gaussian Flow.github.io/
Open Datasets Yes Ne RF-DS Dataset. This dataset (Yan et al., 2023) consists of 8 scenes in everyday environments with various types of moving or deforming specular objects. Consistent4D Dataset. This dataset (Jiang et al., 2023) includes 14 synthetic and 12 in-the-wild monocular videos. Plenoptic Video Dataset. A high-quality real-world dataset consists of 6 scenes with 30FPS and 2028 2704 resolution.
Dataset Splits Yes Plenoptic Video Dataset. ... There are 15 to 20 camera views per scene for training and 1 camera view for testing. ... Consistent4D Dataset. ... Each input monocular video with a static camera is set at an azimuth angle of 0 . Ground-truth images include four distinct views at azimuth angles of -75 , 15 , 105 , and 195 , respectively, while keeping elevation, radius, and other camera parameters the same with input camera.
Hardware Specification No Even with more memory footprint by tracking per-pixel gradients for Gaussians, a single 30GB GPU is adequate for reproducing all our results.
Software Dependencies No And Gaussian flow flow G is calculated by Eq.8 with Py Torch. ... Variables including the weights and top-K indices of Gaussians per pixel (as mentioned in implementation details of our main paper) are calculated in CUDA by modifying the original CUDA kernel codes of 3D Gaussian Splatting (Kerbl et al., 2023).
Experiment Setup Yes In our 4D generation experiment, we run 500 iterations of static optimization to initialize 3D Gaussian fields with a batch size of 16. The Tmax in SDS is linearly decayed from 0.98 to 0.02. For dynamic representation, we run 600 iterations with batch size of 4 for both DG4D (Ren et al., 2023) and ours. The flow loss weight λ1 in Eq. 11 of our main paper is 1.0. Our method slightly decreases speed and increases memory only in training stage but not in the inference stage because our flow supervision is only for training a better/robust deformation field or other 4DGS designs and then will not be needed in inference stage. The training speed for DG4D is around 1.4it/s while it then becomes around 2.2it/s with our flow supervision. And the difference between training speeds with (around 2.5s/it) and without (around 2.2s/it) our flow supervision for RT-4DGS is marginal. Even with more memory footprint by tracking per-pixel gradients for Gaussians, a single 30GB GPU is adequate for reproducing all our results. In our 4D novel view synthesis experiment, we follow RT-4DGS(Yang et al., 2023c) except that we add our proposed flow supervision for all cameras. The flow loss weight λ1 in Eq. 11 of our main paper is 0.5.