Diffusion4D: Fast Spatial-temporal Consistent 4D generation via Video Diffusion Models
Authors: HANWEN LIANG, Yuyang Yin, Dejia Xu, hanxue liang, Zhangyang "Atlas" Wang, Konstantinos N Plataniotis, Yao Zhao, Yunchao Wei
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our method surpasses prior state-of-the-art techniques in terms of generation efficiency and 4D geometry consistency across various prompt modalities. |
| Researcher Affiliation | Academia | Hanwen Liang1 , Yuyang Yin2 , Dejia Xu3, Hanxue Liang4, Zhangyang Wang3, Konstantinos N. Plataniotis1, Yao Zhao2, Yunchao Wei2 1University of Toronto, 2Beijing Jiaotong University, 3University of Texas at Austin, 4University of Cambridge |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We will release the code and the dynamic 3D assets idx of the dataset for reproduction of the results. |
| Open Datasets | Yes | We curate a large-scale, high-quality dynamic 3D dataset sourced from the vast 3D data corpus of Objaverse-1.0 [10] and Objaverse-XL [9]. |
| Dataset Splits | No | The paper mentions a 'test set' but does not specify a 'validation set' or its split. |
| Hardware Specification | Yes | We use a valid batch size of 128 and train on 8 NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper mentions specific pre-trained models like Video MV [58], Model Scope T2V [42], and I2VGen-XL [56], but does not provide specific version numbers for general software dependencies (e.g., Python, PyTorch). |
| Experiment Setup | Yes | We train our 4D-aware video diffusion model for 6k iterations with a constant learning rate of 3 10 5. We use a valid batch size of 128 and train on 8 NVIDIA A100 GPUs. During the sampling stage, we use DDIM [37] sampling with sampling step 50, and w1 = 7.0 and w2 = 0.5 in classifier-free guidance. |