OSN: Infinite Representations of Dynamic 3D Scenes from Monocular Videos
Authors: Ziyang Song, Jinxi Li, Bo Yang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that our method surpasses all baselines and achieves superior accuracy in dynamic novel view synthesis on multiple synthetic and real-world datasets. Most notably, our method demonstrates a clear advantage in learning finegrained 3D scene geometry. |
| Researcher Affiliation | Academia | 1Shenzhen Research Institute, The Hong Kong Polytechnic University, Shenzhen, China 2v LAR Group, The Hong Kong Polytechnic University, Hung Hom, HKSAR . Correspondence to: Bo Yang <bo.yang@polyu.edu.hk>. |
| Pseudocode | No | The paper describes its joint training procedure and components in text but does not include any formal pseudocode blocks or algorithms labeled as such. |
| Open Source Code | Yes | Our code and data are available at https://github.com/v LAR-group/OSN |
| Open Datasets | Yes | Our method is primarily evaluated on three public datasets: 1) an adapted version of the synthetic Dynamic Indoor Scene Dataset (Li et al., 2023b) with 4 scenes and each scene has 3 4 objects with different rigid motions captured, 2) the real-world Oxford Multimotion Dataset (Judd & Gammell, 2019) with 4 scenes selected and each scene contains 2 4 rigid dynamic objects, and 3) the popular but relatively simple real-world NVIDIA Dynamic Scene Dataset (Yoon et al., 2020) with 3 scenes selected (deformable scenes excluded) as each scene has only one moving object. |
| Dataset Splits | No | The paper describes training and testing splits for the datasets (e.g., '15 frames for the training split, while leaving the 210 frames at held-out viewpoints and time instances for the testing split' for Dynamic Indoor Scene Dataset) but does not mention a separate validation split. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions using the Adam optimizer and TensoRF model, but it does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used. |
| Experiment Setup | Yes | Loss Weights: In Stage 1 Boostrapping Per-object Representation, for each object, the RGB loss ℓk rgb and depth loss ℓk depth are weighted by {1.0, 1.0}. In Stage 2 Alternative Optimization, the RGB loss ℓscene rgb , the depth loss ℓscene depth, and the segmentation loss ℓscene seg are weighted by {1.0, 1.0, 0.01} in the whole training process. Training Schedule: We adopt the Adam optimizer with a learning rate of 0.001 for both object scale-invariant representation module and the object scale network. We optimize the former for a total of 30K/ 30K/ 80K/ iterations on the Dynamic Indoor Scene/ Oxford Multimotion/ Nvidia Dynamic Scene datasets. |