DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos

Authors: Wen-Hsuan Chu, Lei Ke, Katerina Fragkiadaki

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show extensive results on challenging DAVIS, Kubric, and self-captured videos with quantitative comparisons and a user preference study.
Researcher Affiliation Academia Wen-Hsuan Chu , Lei Ke , Katerina Fragkiadaki Carnegie Mellon University {wenhsuac,leik,katef}@cs.cmu.edu
Pseudocode No The paper describes its method using text and diagrams but does not include any explicit pseudocode or algorithm blocks.
Open Source Code No We will release our code and hope our work will stimulate more research on fine-grained 4D understanding from videos.
Open Datasets Yes We evaluate the performance of Dream Scene4D on more challenging multi-object video datasets, including DAVIS [37], Kubric [15], and some self-captured videos with large object motion.
Dataset Splits No The paper discusses optimization steps and batch sizes for training but does not explicitly mention validation dataset splits (e.g., specific percentages or counts for a validation set separate from training and testing).
Hardware Specification Yes We run our experiments on one 40GB A100 GPU.
Software Dependencies No The paper mentions using specific frameworks like 'Gaussian Splatting [20, 51]', 'Zero-1-to-3 [26]', 'SD-Inpaint [45]', 'K-plane [13]', and 'Adam W optimizer', but does not provide specific version numbers for any of these software dependencies or programming languages.
Experiment Setup Yes We crop and scale the individual objects to around 65% of the image size for object lifting. For static 3D Gaussian optimization, we optimize for 1000 iterations with a batch size of 16. For optimizing the dynamic components, we optimize for 100 times the number of frames with a batch size of 10. More implementation and running time details are provided in the appendix. We use the same set of hyperparameters as Dream Gaussian and use a learning rate that decays from 1e 3 to 2e 5 for the position, a static learning rate of 0.01 for the spherical harmonics, 0.05 for the opacity, and 5e 3 for the scale and rotation. The learning rate of the Hexplane grid is set to 6.4e 4 while the learning rate of the MLP prediction heads is set to 6.4e 3. We use the Adam W optimizer for all our optimization processes.