OSN: Infinite Representations of Dynamic 3D Scenes from Monocular Videos

Authors: Ziyang Song, Jinxi Li, Bo Yang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that our method surpasses all baselines and achieves superior accuracy in dynamic novel view synthesis on multiple synthetic and real-world datasets. Most notably, our method demonstrates a clear advantage in learning finegrained 3D scene geometry.
Researcher Affiliation Academia 1Shenzhen Research Institute, The Hong Kong Polytechnic University, Shenzhen, China 2v LAR Group, The Hong Kong Polytechnic University, Hung Hom, HKSAR . Correspondence to: Bo Yang <bo.yang@polyu.edu.hk>.
Pseudocode No The paper describes its joint training procedure and components in text but does not include any formal pseudocode blocks or algorithms labeled as such.
Open Source Code Yes Our code and data are available at https://github.com/v LAR-group/OSN
Open Datasets Yes Our method is primarily evaluated on three public datasets: 1) an adapted version of the synthetic Dynamic Indoor Scene Dataset (Li et al., 2023b) with 4 scenes and each scene has 3 4 objects with different rigid motions captured, 2) the real-world Oxford Multimotion Dataset (Judd & Gammell, 2019) with 4 scenes selected and each scene contains 2 4 rigid dynamic objects, and 3) the popular but relatively simple real-world NVIDIA Dynamic Scene Dataset (Yoon et al., 2020) with 3 scenes selected (deformable scenes excluded) as each scene has only one moving object.
Dataset Splits No The paper describes training and testing splits for the datasets (e.g., '15 frames for the training split, while leaving the 210 frames at held-out viewpoints and time instances for the testing split' for Dynamic Indoor Scene Dataset) but does not mention a separate validation split.
Hardware Specification No The paper does not provide specific details about the hardware used for the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions using the Adam optimizer and TensoRF model, but it does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used.
Experiment Setup Yes Loss Weights: In Stage 1 Boostrapping Per-object Representation, for each object, the RGB loss ℓk rgb and depth loss ℓk depth are weighted by {1.0, 1.0}. In Stage 2 Alternative Optimization, the RGB loss ℓscene rgb , the depth loss ℓscene depth, and the segmentation loss ℓscene seg are weighted by {1.0, 1.0, 0.01} in the whole training process. Training Schedule: We adopt the Adam optimizer with a learning rate of 0.001 for both object scale-invariant representation module and the object scale network. We optimize the former for a total of 30K/ 30K/ 80K/ iterations on the Dynamic Indoor Scene/ Oxford Multimotion/ Nvidia Dynamic Scene datasets.