MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes
Authors: Zhenhui Ye, Tianyun Zhong, Yi Ren, Ziyue Jiang, Jiawei Huang, Rongjie Huang, Jinglin Liu, Jinzheng He, Chen Zhang, Zehan Wang, Xize Cheng, Xiang Yin, Zhou Zhao
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our Mimic Talk surpasses previous baselines regarding video quality, efficiency, and expressiveness. |
| Researcher Affiliation | Collaboration | Zhenhui Ye 1,2 Tianyun Zhong 1,2 Yi Ren 2 Ziyue Jiang 1,2 Jiawei Huang 1,2 Rongjie Huang 1 Jinglin liu 2 Jinzheng He 1 Chen Zhang 2 Zehan Wang 1 Xize Chen 1 Xiang Yin 2 Zhou Zhao 1 1Zhejiang University, 2Byte Dance |
| Pseudocode | No | The paper describes methods through network diagrams and mathematical equations, but does not include structured pseudocode or algorithm blocks labeled |
| Open Source Code | Yes | Source code and video samples are available at https://mimictalk.github.io. |
| Open Datasets | Yes | To train the ICS-A2M model, we use a large-scale lip-reading dataset, voxceleb2 (Chung et al., 2018), which consists of about 2,000 hours videos from 6,112 celebrities. |
| Dataset Splits | Yes | For training efficiency, as shown in Fig. 4(a), we adapt the model on a 180-second-long clip as the training data and use the lasting 10-second clip as the validation set. |
| Hardware Specification | Yes | For the SD-Hybrid adaptation, we trained the model on 1 Nvidia A100 GPU, with a batch size of 1 and total iterations of 2,000, requiring about 8 GB of GPU memory and 0.26 hours. Regarding the ICS-A2M model, we trained it on 4 Nvidia A100 GPUs, with a batch size of 20,000 mel frames per GPU. |
| Software Dependencies | No | The paper does not provide specific version numbers for software libraries or frameworks used in the experiments. |
| Experiment Setup | Yes | We set the learning rate to 0.001, λLPIPS = 0.2, λID = 0.1. We provide detailed hyper-parameter settings about the model configuration in Table 6. |