Dynamic 3D Gaussian Fields for Urban Areas

Authors: Tobias Fischer, Jonas Kulhanek, Samuel Rota Bulò, Lorenzo Porzi, Marc Pollefeys, Peter Kontschieder

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, we surpass the state-of-the-art by over 3 d B in PSNR and more than 200 in rendering speed.
Researcher Affiliation Collaboration Tobias Fischer1 Jonas Kulhanek1,3 Samuel Rota Bulò2 Lorenzo Porzi2 Marc Pollefeys1 Peter Kontschieder2 1 ETH Zürich 2 Meta Reality Labs 3 CTU Prague
Pseudocode No The paper describes the method and its components but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes We released the full source code for reproducing our experiments. All datasets used are publicly available.
Open Datasets Yes We utilize the recently proposed NVS benchmark [17] of Argoverse 2 [81]... We use the established Waymo Open [23], KITTI [21] and VKITTI2 [22] benchmarks...
Dataset Splits Yes For Argoverse 2, we follow the experimental setup of [17]. In particular, we use the full resolution 1550 2080 images for training and evaluation and use all cameras of every 10th temporal frame as the testing split. ... For KITTI and VKITTI, we follow the established benchmark used in [16, 83, 17, 73]. We use the full resolution 375 1242 images for training and evaluation and evaluate at varying training set fractions.
Hardware Specification Yes In our multi-sequence experiments in Table 1 and Table 5, we train our model on 8 NVIDIA A100 40GB GPUs for 125,000 steps, taking approximately 2.5 days. In our single-sequence experiments, we train our model on a single RTX 4090 GPU for several hours.
Software Dependencies No We implement our method in Py Torch [80] with tools from nerfstudio [85].
Experiment Setup Yes We use λrgb := 0.8, λssim := 0.2 and λdepth := 0.05. We use the Adam optimizer [86] with β1 := 0.9, β2 := 0.999. We use separate learning rates for each 3D Gaussian attribute, the neural fields, and the sequence latent codes ωt s. In particular, for means µ, we use an exponential decay learning rate schedule from 1.6 10 5 to 1.6 10 6, for opacity α, we use a learning rate of 5 10 2, for scales a and rotations q, we use a learning rate of 10 3. The neural fields are trained with an exponential decay learning rate schedule from 2.5 10 3 to 2.5 10 4. The sequence latent vectors ωt s are optimized with a learning rate of 5 10 4. We optimize camera and object pose parameters with an exponential decay learning rate schedule from 10 5 to 10 6. To counter pose drift, we apply weight decay with a factor 10 2.