S-NeRF: Neural Radiance Fields for Street Views

Authors: Ziyang Xie, Junge Zhang, Wenye Li, Feihu Zhang, Li Zhang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Thorough experiments on the large-scale driving datasets (e.g., nu Scenes and Waymo) demonstrate that our method beats the state-of-the-art rivals by reducing 7 40% of the mean-squared error in the street-view synthesis and a 45% PSNR gain for the moving vehicles rendering.
Researcher Affiliation Academia Ziyang Xie1 , Junge Zhang1 , Wenye Li1, Feihu Zhang2, Li Zhang1 1Fudan University 2University of Oxford
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes https://ziyang-xie.github.io/s-nerf
Open Datasets Yes We perform our experiments on two open source self-driving datasets: nu Scenes (Caesar et al., 2019) and Waymo (Sun et al., 2020).
Dataset Splits Yes For the foreground vehicles, we extract car crops from nu Scenes and Waymo video sequences. For each vehicle, there are around 2 8 views used for training and 1 3 views for testing.
Hardware Specification Yes The training takes about 2 hours for each vehicle on a single RTX3090 gpu. For each vehicle, there are around 2 8 views used for training and 1 3 views for testing. Our S-Ne RF is trained on two RTX3090 GPUs which takes about 17 hours for a scene with about 250 images (with a resolution of 1280 1920).
Software Dependencies No We use NLSPN (Park et al., 2020) network for depth completion, which propagates the depth information from Li DAR points to surrounding pixels.
Experiment Setup Yes In all the experiments, the depth and smooth loss weight λ1 and λ2 are set to 1 and 0.15 respectively for foreground vehicles. And for background street scenes, we set τ = 20% for confidence measurement and the radius r = 3 in all scenes. λ1 = 0.2 and λ2 = 0.01 are used as the loss balance weights. We train our S-Ne RF for 30k iterations using Adam optimizer with 5 4 as the learning rate and 1024 as the batch size. The learning rate is reduced log-linearly from 5 10 4 to 5 10 6 with a warm-up phase of 2500 iterations.