DynaVol: Unsupervised Learning for Dynamic Scenes through Object-Centric Voxelization
Authors: Yanpeng Zhao, Siyu Gao, Yunbo Wang, Xiaokang Yang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we initially evaluate Dyna Vol on simulated 3D dynamic scenes that contain different numbers of objects, diverse motions, shapes (such as cubes, spheres, and real-world shapes), and materials (such as rubber and metal). On the simulated dataset, we can directly assess the performance of Dyna Vol for scene decomposition by projecting the object-centric volumetric representations onto 2D planes, and compare it with existing approaches, such as SAVi (Kipf et al., 2022) and u ORF (Yu et al., 2022). Additionally, we demonstrate the effectiveness of Dyna Vol in novel view synthesis and dynamic scene editing using real-world videos. |
| Researcher Affiliation | Academia | Yanpeng Zhao Siyu Gao Yunbo Wang Xiaokang Yang Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University {zhao-yan-peng, siyu.gao, yunbow, xkyang}@sjtu.edu.cn |
| Pseudocode | Yes | Algorithm 1 Pseudocode of the 3D-to-4D voxel expansion algorithm |
| Open Source Code | No | The paper provides a link to a project page (https://sites.google.com/view/dynavol/), which states 'Code coming soon!' at the time of review. This does not constitute concrete access to source code for the methodology described. |
| Open Datasets | Yes | We build the 8 synthetic dynamic scenes in Table 1 using the Kubric simulator (Greff et al., 2022). Each scene spans 60 timestamps and contains different numbers of objects in various colors, shapes, and textures. We also adopt 4 real-world scenes from Hyper Ne RF (Park et al., 2021) and D2Ne RF (Wu et al., 2022), as shown in Table 2. |
| Dataset Splits | No | The paper describes how data is used (e.g., 'evaluated over 60 novel views'), but it does not provide specific percentages or sample counts for train/validation/test dataset splits needed for reproducibility of data partitioning. |
| Hardware Specification | Yes | All experiments run on an NVIDIA RTX3090 GPU and last for about 3 hours. |
| Software Dependencies | No | The paper mentions using the Adam optimizer but does not specify versions of programming languages, libraries, or other software dependencies required to reproduce the experiments. |
| Experiment Setup | Yes | We set the size of the voxel grid to 1103, the assumed number of maximum objects to N = 10, and the dimension of slot features to D = 64. We use 4 hidden layers with 64 channels in the renderer, and use the Adam optimizer with a batch of 1,024 rays in the two training stages. The base learning rates are 0.1 for the voxel grids and 1e 3 for all model parameters in the warmup stage and then adjusted to 0.08 and 8e 4 in the second training stage. The two training stages last for 50k and 35k iterations respectively. The hyperparameters in the loss functions are set to αp = 0.1, αe = 0.01, αw = 1.0, αc = 1.0. |