Splatter a Video: Video Gaussian Representation for Versatile Processing

Authors: Yang-Tian Sun, Yihua Huang, Lin Ma, Xiaoyang Lyu, Yan-Pei Cao, Xiaojuan Qi

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6 Experiments, Evaluation We conducted experiments on the DAVIS dataset [36] as well as some videos used by Omnimotion [48] and Co De F [33]. Our approach is evaluated based on two criteria: 1) reconstructed video quality and 2) downstream video processing tasks. In addition to general video representation methods Deformable Sprites [58], Omnimotion [48] and Co De F [33], we also compare with dynamic Ne RF/3DGS methods, namely 4DGS [51] and Ro Dyn RF [23].
Researcher Affiliation Collaboration Yang-Tian Sun1 Yi-Hua Huang1 Lin Ma Xiaoyang Lyu1 Yan-Pei Cao2 Xiaojuan Qi1 1The University of Hong Kong 2 VAST
Pseudocode No The paper does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code No Code and data are not provided for now but will be released to the public.
Open Datasets Yes Evaluation We conducted experiments on the DAVIS dataset [36] as well as some videos used by Omnimotion [48] and Co De F [33].
Dataset Splits No The paper mentions using the DAVIS dataset but does not provide specific training, validation, or test splits such as percentages or sample counts.
Hardware Specification Yes The training duration is approximately 15-20 minutes on an NVIDIA 3090 GPU.
Software Dependencies No The paper mentions using RAFT [44], Marigold [15], SAM [17], and DINOv2 [31] but does not provide specific version numbers for these or other software libraries/frameworks used for implementation.
Experiment Setup Yes Typically, we use a video clip of about 50-100 frames and train the system iteratively for 20,000 steps. ... The Gaussians are initialized as 10,0000 points randomly sampled in a [ 1, 1] [ 1, 1] [0, 1] box. ... Every 100 steps, Gaussians with an accumulated gradient scale of positions above a threshold will be densified. ... The loss weights for render, depth, flow, motion regularization, and label are set to λrender = 5.0, λdepth = 1.0, λflow = 2.0, λarap = 0.1, and λlabel = 1.0. (Table 3 also provides specific learning rates for various attributes).