DreamMesh4D: Video-to-4D Generation with Sparse-Controlled Gaussian-Mesh Hybrid Representation

Authors: Zhiqi Li, Yiming Chen, Peidong Liu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate superior performance of our method in terms of both rendering quality and spatial-temporal consistency.
Researcher Affiliation Academia Zhiqi Li 1,2 Yiming Chen 1,2 Peidong Liu 2 1 Zhejiang University 2 Westlake University {lizhiqi49, chenyiming, liupeidong}@westlake.edu.cn
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes The source code is available at our website: https://lizhiqi49.github.io/Dream Mesh4D.
Open Datasets Yes Dataset: Our quantitative results are evaluated on the test dataset provided by Consistent4D [12], which contains seven multi-view videos.
Dataset Splits No The paper uses 'the test dataset provided by Consistent4D [12]' and states 'Each video has one input view for scene generation and four testing views for evaluation.' It does not explicitly define training, validation, or test splits by percentage or sample count, nor does it mention a specific validation set.
Hardware Specification Yes All of our experiments are conducted on a single NVIDIA RTX 4090 GPU.
Software Dependencies No Table 3 lists software assets like threestudio, diffusers, pytorch3d, zero123, and Su Ga R along with their GitHub URLs and licenses, but it does not specify concrete version numbers for these dependencies, only general project names.
Experiment Setup Yes In the process of both coarse mesh generation and Su Ga R refinement, the loss in Equation 3 is utilized for supervision, with the strength of different terms as λs SDS = 0.1, λs ref = 1000 and λs mask = 500. For the generation of coarse mesh, a set of randomly initialized 3D Gaussians are optimized for 3000 steps in total. In the former 1500 steps, we do densification and pruning every 100 steps. After the 1500 steps, densification and pruning are stopped, and we introduce additional opacity binary and density regularization terms as described in [9] into optimization until step 3000. Finally, we prune all Gaussians with opacity less than 0.5 and extract coarse mesh through Poisson reconstruction. Afterwards, we bind x = 6 new flat Gaussians to each triangle face of the coarse mesh, and do optimization for 2000 steps. In dynamic stage, we defaultly sample Nnode = 1024 control nodes and assign Nneighbor = 4 neighboring nodes for each vertex when constructing deformation graph. For each training step, 8 frames are randomly sampled from the input video for supervision. And for each sampled timestamp, we randomly sample 2 views for the calculation of SDS loss. All images are rendered at resolution 512 512 with white background. The camera distance to world center is fixed as 3.8 and the degree of field-of-view (Fo V) is fixed as 20 . As for the strenghts of different loss terms, we defautly set λSDS = 0.1, λref = 5000, λmask = 500 and λNC = 10. The value of λARAP is chosen case-specifically in [1, 10] according to the motion amplitude of object. The deformation network is zero-ly initialized and totally optimized for 2000 steps with learning rate as 0.00032.