EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision
Authors: Jiawei Yang, Boris Ivanovic, Or Litany, Xinshuo Weng, Seung Wook Kim, Boyi Li, Tong Che, Danfei Xu, Sanja Fidler, Marco Pavone, Yue Wang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we benchmark the reconstruction capabilities of Emer Ne RF against prior methods, focusing on static and dynamic scene reconstruction, novel view synthesis, scene flow estimation, and foundation model feature reconstruction. Further ablation studies and a discussion of Emer Ne RF s limitations can be found in Appendices C.1 and C.3, respectively. |
| Researcher Affiliation | Collaboration | University of Southern California $ danfei@gatech.edu, Georgia Institute of Technology fidler@cs.toronto.edu, University of Toronto pavone@stanford.edu, Stanford University orlitany@gmail.com, Technion {bivanovic,xweng,seungwookk,boyil,tongc}@nvidia.com, NVIDIA Research |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | See the project page for code, data, and request pre-trained models: https://emernerf.github.io |
| Open Datasets | Yes | To remedy this, we introduce Ne RF On-The-Road (NOTR) benchmark, a balanced and diverse benchmark derived from the Waymo Open Dataset (Sun et al., 2020). |
| Dataset Splits | No | For scene reconstruction, all samples in a log are used for training. For novel view synthesis, we omit every 10th timestep, resulting in 10% novel views for evaluation. The paper does not explicitly mention a separate validation split. |
| Hardware Specification | Yes | Training durations on a single A100 GPU are as follows: for static scenes, feature-free training requires 33 minutes, while the feature-embedded approach takes 40 minutes. Dynamic scene training, which incorporates the flow field and feature aggregation, extends the durations to 2 hours for feature-free and 2.25 hours for feature-embedded representations. |
| Software Dependencies | No | The paper mentions using specific toolkits like tiny-cuda-nn and nerfacc toolkit, and JAX implementations for baselines, but does not provide specific version numbers for these or other software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We train our models for 25k iterations using a batch size of 8196. To mitigate excessive regularization when the geometry prediction is not reliable, we enable line-of-sight loss after the initial 2k iterations and subsequently halve its coefficient every 5k iterations. |