QUEEN: QUantized Efficient ENcoding of Dynamic Gaussians for Streaming Free-viewpoint Videos

Authors: Sharath Girish, Tianye Li, Amrita Mazumdar, Abhinav Shrivastava, david luebke, Shalini De Mello

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach, QUEEN, on two benchmark datasets, containing diverse scenes with large geometric motion and illumination changes. QUEEN outperforms all prior state-of-the-art online FVV methods on all metrics. Notably, for several highly dynamic scenes, it reduces the model size to just 0.7 MB per frame while training in under 5 sec and rendering at 350 FPS.
Researcher Affiliation Collaboration Sharath Girish University of Maryland sgirish@cs.umd.edu Tianye Li NVIDIA tianyel@nvidia.com Amrita Mazumdar NVIDIA amritam@nvidia.com Abhinav Shrivastava University of Maryland abhinav@cs.umd.edu David Luebke NVIDIA dluebke@nvidia.com Shalini De Mello NVIDIA shalinig@nvidia.com
Pseudocode No The paper describes methods in text and figures, but does not include a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code No We aim to release the code in the future.
Open Datasets Yes We evaluate our method on two challenging FVV video datasets. (1) Neural 3D Videos (N3DV) [41] consists of six indoor scenes with forward-facing 20-view videos. (2) Immersive Videos [4] consists of seven indoor and outdoor scenes captures with 46 cameras.
Dataset Splits No The paper states: 'In both datasets, the central view is held out for testing.' and describes training on the remaining views. It does not explicitly define a separate validation dataset split for hyperparameter tuning or early stopping.
Hardware Specification Yes We train for 500 and 350 epochs for the first time-step, and for 10 and 15 epochs for the subsequent time-steps, for N3DV and Immersive, respectively, on an NVIDIA A100 GPU.
Software Dependencies No The paper mentions building its implementation on [29] (3D Gaussian Splatting) and using the Adam optimizer [30], but does not provide specific version numbers for these or other software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes We train for 500 and 350 epochs for the first time-step, and for 10 and 15 epochs for the subsequent time-steps, for N3DV and Immersive, respectively... We set the SH degree to 2 for N3DV and 3 for Immersive. We set the score vector threshold td = 0.001 for all experiments... The position residual learning rate is set to 0.00016 for N3DV and 0.0005 for Immersive. Other hyperparameters are provided in Table 11.