reproducibilityindex.ai

D^2NeRF: Self-Supervised Decoupling of Dynamic and Static Objects from a Monocular Video

Authors: Tianhao Wu, Fangcheng Zhong, Andrea Tagliasacchi, Forrester Cole, Cengiz Oztireli

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We introduce a new dataset containing various dynamic objects and shadows and demonstrate that our method can achieve better performance than state-of-the-art approaches in decoupling dynamic and static 3D objects, occlusion and shadow removal, and image segmentation for moving objects. Project page: d2nerf.github.io
Researcher Affiliation	Collaboration	Tianhao Wu University of Cambridge Fangcheng Zhong University of Cambridge Andrea Tagliasacchi Google Research Simon Fraser University Forrester Cole Google Research Cengiz Oztireli Google Research University of Cambridge
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	Our method is easily reproducible, as we intend to release code and datasets upon publication to facilitate future research.
Open Datasets	Yes	We introduce a new dataset with rigid and non-rigid dynamic objects, rapid camera motion and various moving shadows in both the synthetic and real-world settings to evaluate these two aspects, and show that our method achieves better performance than state-of-the-art approaches. Synthetic dataset We generate a synthetic dataset with ground-truth masks for moving objects and their shadows with Kubric [16]. This dataset consists of five scenes containing one or multiple dynamic objects from Shape Net [5] with rigid or non-rigid motion, and the corresponding Kubric worker script is provided in our supplementary material.
Dataset Splits	Yes	We move the virtual camera over 10 keyframes randomly sampled from azimuth [2, 2 + π/4] and altitude [1, 1.2] to generate a 200-frame video sequence for training. We also rotate the virtual camera around the center of all keyframes to generate 100 validation views with only the static background being visible.
Hardware Specification	Yes	This training procedure spans approximately two hours on four NVIDIA A100-SXM-80GB GPUs.
Software Dependencies	No	The paper does not provide specific software versions for its dependencies.
Experiment Setup	Yes	The optimization takes 100k iterations with batch size 1024 and an exponentially decayed learning rate from 10 3 to 10 5. For scenes with a mixture of dynamic objects and shadows, we apply shadow decay and set λρ=0.1. We set λρ=0.001 for scenes featuring view-correlated dynamic shadows only. We experimentally found that the optimal choice of the hyperparameters, especially λb, λr and the skewness k, are strongly influenced by the level of object motion, camera motion, and video length. Therefore, we performed a grid search on our synthetic and held-out real-world scenes, and some scenes from DAVIS [42], to establish a set of hyperparameters applicable to a variety of scenarios; details about hyperparameters can be found in the supplementary.