DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos
Authors: Wen-Hsuan Chu, Lei Ke, Katerina Fragkiadaki
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show extensive results on challenging DAVIS, Kubric, and self-captured videos with quantitative comparisons and a user preference study. |
| Researcher Affiliation | Academia | Wen-Hsuan Chu , Lei Ke , Katerina Fragkiadaki Carnegie Mellon University {wenhsuac,leik,katef}@cs.cmu.edu |
| Pseudocode | No | The paper describes its method using text and diagrams but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | We will release our code and hope our work will stimulate more research on fine-grained 4D understanding from videos. |
| Open Datasets | Yes | We evaluate the performance of Dream Scene4D on more challenging multi-object video datasets, including DAVIS [37], Kubric [15], and some self-captured videos with large object motion. |
| Dataset Splits | No | The paper discusses optimization steps and batch sizes for training but does not explicitly mention validation dataset splits (e.g., specific percentages or counts for a validation set separate from training and testing). |
| Hardware Specification | Yes | We run our experiments on one 40GB A100 GPU. |
| Software Dependencies | No | The paper mentions using specific frameworks like 'Gaussian Splatting [20, 51]', 'Zero-1-to-3 [26]', 'SD-Inpaint [45]', 'K-plane [13]', and 'Adam W optimizer', but does not provide specific version numbers for any of these software dependencies or programming languages. |
| Experiment Setup | Yes | We crop and scale the individual objects to around 65% of the image size for object lifting. For static 3D Gaussian optimization, we optimize for 1000 iterations with a batch size of 16. For optimizing the dynamic components, we optimize for 100 times the number of frames with a batch size of 10. More implementation and running time details are provided in the appendix. We use the same set of hyperparameters as Dream Gaussian and use a learning rate that decays from 1e 3 to 2e 5 for the position, a static learning rate of 0.01 for the spherical harmonics, 0.05 for the opacity, and 5e 3 for the scale and rotation. The learning rate of the Hexplane grid is set to 6.4e 4 while the learning rate of the MLP prediction heads is set to 6.4e 3. We use the Adam W optimizer for all our optimization processes. |