Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels

Authors: Yikai Wang, Xinzhou Wang, Zilong Chen, Zhengyi Wang, Fuchun Sun, Jun Zhu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we provide an extensive evaluation of our method DGS (Sec. 3.2) with the initialization in Sec. 3.3, comparing both appearance and geometry against previous state-of-the-art methods. Additionally, we analyze the contributions of each proposed component in detail. For all qualitative and quantitative experiments, we follow the standard pipeline for dynamic reconstruction [58], to construct our evaluation setup by selecting every fourth frame as a training frame and designating the middle frame between each pair of training frames as a validation frame.
Researcher Affiliation Academia Yikai Wang 1, Xinzhou Wang 1,2,3, Zilong Chen1,2, Zhengyi Wang1,2, Fuchun Sun1, Jun Zhu 1,2 1Department of Computer Science and Technology, BNRist Center, Tsinghua University 2Sheng Shu 3College of Electronic and Information Engineering, Tongji University yikaiw@outlook.com, wangxinzhou@tongji.edu.cn, dcszj@tsinghua.edu.cn
Pseudocode No The paper describes its methods in detail using prose and mathematical equations but does not include any explicitly labeled pseudocode blocks or algorithms.
Open Source Code No In the NeurIPS Paper Checklist, Section 5, it is stated: "Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We will release it after acceptance." This indicates the code is not yet publicly available.
Open Datasets No The paper uses videos generated by an existing video generative model [4] for its evaluation. It does not provide information about a publicly available dataset with a direct link, DOI, repository name, or formal citation for external access to the specific videos used for training/evaluation.
Dataset Splits Yes For all qualitative and quantitative experiments, we follow the standard pipeline for dynamic reconstruction [58], to construct our evaluation setup by selecting every fourth frame as a training frame and designating the middle frame between each pair of training frames as a validation frame.
Hardware Specification Yes For each reconstruction, the overall training takes over 1 hour on an A800 GPU.
Software Dependencies No The paper mentions several models/frameworks like NeRF [55], DinoV2 [56], and NeuS [87] as dependencies or comparisons, but it does not specify software dependencies with version numbers (e.g., Python, PyTorch, CUDA, or specific library versions).
Experiment Setup Yes Our model configuration involves several key parameters to balance reconstruction and regularization losses. For the field initialization stage, we use a similar architecture with 8 layers for volume rendering as in Ne RF [55], and initialize MLP for predicting SDF as an approximate unit sphere [101]. For the DGS stage, we initialize centers of the Gaussian surfels with the sampled surface points extracted from the neural SDF, and initialize the warping field by the forward field from the first stage. The dimension of the latent code embedding γt b is set as 128. Following BANMo [98], we adopt 25 bones to optimize skinning weights.