Splatter a Video: Video Gaussian Representation for Versatile Processing
Authors: Yang-Tian Sun, Yihua Huang, Lin Ma, Xiaoyang Lyu, Yan-Pei Cao, Xiaojuan Qi
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6 Experiments, Evaluation We conducted experiments on the DAVIS dataset [36] as well as some videos used by Omnimotion [48] and Co De F [33]. Our approach is evaluated based on two criteria: 1) reconstructed video quality and 2) downstream video processing tasks. In addition to general video representation methods Deformable Sprites [58], Omnimotion [48] and Co De F [33], we also compare with dynamic Ne RF/3DGS methods, namely 4DGS [51] and Ro Dyn RF [23]. |
| Researcher Affiliation | Collaboration | Yang-Tian Sun1 Yi-Hua Huang1 Lin Ma Xiaoyang Lyu1 Yan-Pei Cao2 Xiaojuan Qi1 1The University of Hong Kong 2 VAST |
| Pseudocode | No | The paper does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures. |
| Open Source Code | No | Code and data are not provided for now but will be released to the public. |
| Open Datasets | Yes | Evaluation We conducted experiments on the DAVIS dataset [36] as well as some videos used by Omnimotion [48] and Co De F [33]. |
| Dataset Splits | No | The paper mentions using the DAVIS dataset but does not provide specific training, validation, or test splits such as percentages or sample counts. |
| Hardware Specification | Yes | The training duration is approximately 15-20 minutes on an NVIDIA 3090 GPU. |
| Software Dependencies | No | The paper mentions using RAFT [44], Marigold [15], SAM [17], and DINOv2 [31] but does not provide specific version numbers for these or other software libraries/frameworks used for implementation. |
| Experiment Setup | Yes | Typically, we use a video clip of about 50-100 frames and train the system iteratively for 20,000 steps. ... The Gaussians are initialized as 10,0000 points randomly sampled in a [ 1, 1] [ 1, 1] [0, 1] box. ... Every 100 steps, Gaussians with an accumulated gradient scale of positions above a threshold will be densified. ... The loss weights for render, depth, flow, motion regularization, and label are set to λrender = 5.0, λdepth = 1.0, λflow = 2.0, λarap = 0.1, and λlabel = 1.0. (Table 3 also provides specific learning rates for various attributes). |