Animate3D: Animating Any 3D Model with Multi-view Video Diffusion

Authors: Yanqin Jiang, Chaohui Yu, Chenjie Cao, Fan Wang, Weiming Hu, Jin Gao

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Qualitative and quantitative experiments demonstrate that Animate3D significantly outperforms previous approaches.
Researcher Affiliation Collaboration Yanqin Jiang1,2 Chaohui Yu3,4 Chenjie Cao3,4 Fan Wang3,4 Weiming Hu1,2,5 Jin Gao1,2 1State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), CASIA 2School of Artificial Intelligence, University of Chinese Academy of Sciences 3DAMO Academy, Alibaba Group 4Hupan Lab 5School of Information Science and Technology, Shanghai Tech University
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code Yes Data, code, and models are open-released.
Open Datasets Yes To train our MV-VDM, we build a large-scale multi-view video dataset, MV-Video. ... We will release this dataset to further advance the field of 4D generative research.
Dataset Splits Yes Following the evaluation setting of VBench [23], we use four different random seeds for each object and report the average results.
Hardware Specification Yes It costs 3 days to train our MV-VDM on 32 80G A800 GPUs, and the optimization for 4D generation takes around 30 minutes on a single A800 GPU per object.
Software Dependencies No The paper mentions optimizer (Adawm) and a method (freeinit), but does not provide specific version numbers for general software dependencies like programming languages or deep learning frameworks.
Experiment Setup Yes We use the Adawm optimizer with a learning rate of 4e 4 and a weight decay 0.01, and train the model for 20 epochs with a batch size of 2048. ... Learning rate is 0.0015 initially and decreases linearly to 0.0005 at the end of reconstruction stage. λ1, λ2 and λ3 in Eq. 11 are set to 10.0, 0.01 and 0.5 , respectively.