Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes
Authors: Zhenhui Ye, Tianyun Zhong, Yi Ren, Ziyue Jiang, Jiawei Huang, Rongjie Huang, Jinglin Liu, Jinzheng He, Chen Zhang, Zehan Wang, Xize Cheng, Xiang Yin, Zhou Zhao
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our Mimic Talk surpasses previous baselines regarding video quality, efficiency, and expressiveness. |
| Researcher Affiliation | Collaboration | Zhenhui Ye 1,2 Tianyun Zhong 1,2 Yi Ren 2 Ziyue Jiang 1,2 Jiawei Huang 1,2 Rongjie Huang 1 Jinglin liu 2 Jinzheng He 1 Chen Zhang 2 Zehan Wang 1 Xize Chen 1 Xiang Yin 2 Zhou Zhao 1 1Zhejiang University, 2Byte Dance |
| Pseudocode | No | The paper describes methods through network diagrams and mathematical equations, but does not include structured pseudocode or algorithm blocks labeled |
| Open Source Code | Yes | Source code and video samples are available at https://mimictalk.github.io. |
| Open Datasets | Yes | To train the ICS-A2M model, we use a large-scale lip-reading dataset, voxceleb2 (Chung et al., 2018), which consists of about 2,000 hours videos from 6,112 celebrities. |
| Dataset Splits | Yes | For training efficiency, as shown in Fig. 4(a), we adapt the model on a 180-second-long clip as the training data and use the lasting 10-second clip as the validation set. |
| Hardware Specification | Yes | For the SD-Hybrid adaptation, we trained the model on 1 Nvidia A100 GPU, with a batch size of 1 and total iterations of 2,000, requiring about 8 GB of GPU memory and 0.26 hours. Regarding the ICS-A2M model, we trained it on 4 Nvidia A100 GPUs, with a batch size of 20,000 mel frames per GPU. |
| Software Dependencies | No | The paper does not provide specific version numbers for software libraries or frameworks used in the experiments. |
| Experiment Setup | Yes | We set the learning rate to 0.001, λLPIPS = 0.2, λID = 0.1. We provide detailed hyper-parameter settings about the model configuration in Table 6. |