EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision

Authors: Jiawei Yang, Boris Ivanovic, Or Litany, Xinshuo Weng, Seung Wook Kim, Boyi Li, Tong Che, Danfei Xu, Sanja Fidler, Marco Pavone, Yue Wang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we benchmark the reconstruction capabilities of Emer Ne RF against prior methods, focusing on static and dynamic scene reconstruction, novel view synthesis, scene flow estimation, and foundation model feature reconstruction. Further ablation studies and a discussion of Emer Ne RF s limitations can be found in Appendices C.1 and C.3, respectively.
Researcher Affiliation Collaboration University of Southern California $ danfei@gatech.edu, Georgia Institute of Technology fidler@cs.toronto.edu, University of Toronto pavone@stanford.edu, Stanford University orlitany@gmail.com, Technion {bivanovic,xweng,seungwookk,boyil,tongc}@nvidia.com, NVIDIA Research
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes See the project page for code, data, and request pre-trained models: https://emernerf.github.io
Open Datasets Yes To remedy this, we introduce Ne RF On-The-Road (NOTR) benchmark, a balanced and diverse benchmark derived from the Waymo Open Dataset (Sun et al., 2020).
Dataset Splits No For scene reconstruction, all samples in a log are used for training. For novel view synthesis, we omit every 10th timestep, resulting in 10% novel views for evaluation. The paper does not explicitly mention a separate validation split.
Hardware Specification Yes Training durations on a single A100 GPU are as follows: for static scenes, feature-free training requires 33 minutes, while the feature-embedded approach takes 40 minutes. Dynamic scene training, which incorporates the flow field and feature aggregation, extends the durations to 2 hours for feature-free and 2.25 hours for feature-embedded representations.
Software Dependencies No The paper mentions using specific toolkits like tiny-cuda-nn and nerfacc toolkit, and JAX implementations for baselines, but does not provide specific version numbers for these or other software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes We train our models for 25k iterations using a batch size of 8196. To mitigate excessive regularization when the geometry prediction is not reliable, we enable line-of-sight loss after the initial 2k iterations and subsequently halve its coefficient every 5k iterations.