Consistent4D: Consistent 360° Dynamic Object Generation from Monocular Video

Authors: Yanqin Jiang, Li Zhang, Jin Gao, Weiming Hu, Yao Yao

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that the proposed Consistent4D significantly outperforms previous 4D reconstruction approaches as well as per-frame 3D generation approaches, opening up new possibilities for 4D dynamic object generation from a single-view uncalibrated video. We have extensively evaluated our approach on both synthetic data and in-the-wild data.
Researcher Affiliation Academia Yanqin Jiang1,2, Li Zhang3, Jin Gao1,2, Weiming Hu1,2,6, Yao Yao4,5 1State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), CASIA 2School of Artificial Intelligence, University of Chinese Academy of Sciences 3School of Data Science, Fudan University 4State Key Laboratory for Novel Software Technology, Nanjing University 5School of Intelligence Science and Technology, Nanjing University 6School of Information Science and Technology, Shanghai Tech University
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No Project page: https://consistent4d.github.io. is provided, which is a project overview page and not an explicit statement of code release or a direct repository link for their method.
Open Datasets Yes For quantitative evaluation, we select and download seven animated models, namely Pistol, Guppie, Crocodie, Monster, Skull, Trump, Aurorus, from Sketchfab (ske, 2023) and render the multi-view videos by ourselves, as shown in Figure 3 and appendix A.3.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) for training, validation, and testing.
Hardware Specification Yes The optimization of dynamic Ne RF and video enhancer cost about 2.5 hours and 15 minutes on a single V100 GPU.
Software Dependencies No The paper mentions software like zero123-xl, RIFE, pix2pix, Deep Floyd-IF, and Threestudio, but does not provide specific version numbers for these software dependencies, which are necessary for reproducible descriptions.
Experiment Setup Yes For Cascade Dy Ne RF, we set s = 2 in most experiments except for the last row in Table 1a, i.e., we have coarse-level and fine-level Dy Ne RFs. The spatial and temporal resolution of Cascade Dy Ne RF are configured to 50 and 8 for coarse-level, and 100 and 16 for fine-level, respectively. We first train Dy Ne RF with batch size 4 and resolution image-level video-level 64 for 5000 iterations. Then we decrease the batch size to 1 and increase the resolution to 256 for the next 5000 iteration training. ICL is employed in the initial 5000 iterations with a probability of 25%. We optimize the Dynamic Ne RF using Equation 8, where λ1 = 0.1, λ2 = 2500, λ3 = 500, λ4 = 50, 5 = 2.0, and λ6 is initially 1 and increased to 20 linearly until 5000 iterations.