DynaVol: Unsupervised Learning for Dynamic Scenes through Object-Centric Voxelization

Authors: Yanpeng Zhao, Siyu Gao, Yunbo Wang, Xiaokang Yang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we initially evaluate Dyna Vol on simulated 3D dynamic scenes that contain different numbers of objects, diverse motions, shapes (such as cubes, spheres, and real-world shapes), and materials (such as rubber and metal). On the simulated dataset, we can directly assess the performance of Dyna Vol for scene decomposition by projecting the object-centric volumetric representations onto 2D planes, and compare it with existing approaches, such as SAVi (Kipf et al., 2022) and u ORF (Yu et al., 2022). Additionally, we demonstrate the effectiveness of Dyna Vol in novel view synthesis and dynamic scene editing using real-world videos.
Researcher Affiliation Academia Yanpeng Zhao Siyu Gao Yunbo Wang Xiaokang Yang Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University {zhao-yan-peng, siyu.gao, yunbow, xkyang}@sjtu.edu.cn
Pseudocode Yes Algorithm 1 Pseudocode of the 3D-to-4D voxel expansion algorithm
Open Source Code No The paper provides a link to a project page (https://sites.google.com/view/dynavol/), which states 'Code coming soon!' at the time of review. This does not constitute concrete access to source code for the methodology described.
Open Datasets Yes We build the 8 synthetic dynamic scenes in Table 1 using the Kubric simulator (Greff et al., 2022). Each scene spans 60 timestamps and contains different numbers of objects in various colors, shapes, and textures. We also adopt 4 real-world scenes from Hyper Ne RF (Park et al., 2021) and D2Ne RF (Wu et al., 2022), as shown in Table 2.
Dataset Splits No The paper describes how data is used (e.g., 'evaluated over 60 novel views'), but it does not provide specific percentages or sample counts for train/validation/test dataset splits needed for reproducibility of data partitioning.
Hardware Specification Yes All experiments run on an NVIDIA RTX3090 GPU and last for about 3 hours.
Software Dependencies No The paper mentions using the Adam optimizer but does not specify versions of programming languages, libraries, or other software dependencies required to reproduce the experiments.
Experiment Setup Yes We set the size of the voxel grid to 1103, the assumed number of maximum objects to N = 10, and the dimension of slot features to D = 64. We use 4 hidden layers with 64 channels in the renderer, and use the Adam optimizer with a batch of 1,024 rays in the two training stages. The base learning rates are 0.1 for the voxel grids and 1e 3 for all model parameters in the warmup stage and then adjusted to 0.08 and 8e 4 in the second training stage. The two training stages last for 50k and 35k iterations respectively. The hyperparameters in the loss functions are set to αp = 0.1, αe = 0.01, αw = 1.0, αc = 1.0.