Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes

Authors: Zhenhui Ye, Tianyun Zhong, Yi Ren, Ziyue Jiang, Jiawei Huang, Rongjie Huang, Jinglin Liu, Jinzheng He, Chen Zhang, Zehan Wang, Xize Cheng, Xiang Yin, Zhou Zhao

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that our Mimic Talk surpasses previous baselines regarding video quality, efficiency, and expressiveness.
Researcher Affiliation Collaboration Zhenhui Ye 1,2 Tianyun Zhong 1,2 Yi Ren 2 Ziyue Jiang 1,2 Jiawei Huang 1,2 Rongjie Huang 1 Jinglin liu 2 Jinzheng He 1 Chen Zhang 2 Zehan Wang 1 Xize Chen 1 Xiang Yin 2 Zhou Zhao 1 1Zhejiang University, 2Byte Dance
Pseudocode No The paper describes methods through network diagrams and mathematical equations, but does not include structured pseudocode or algorithm blocks labeled
Open Source Code Yes Source code and video samples are available at https://mimictalk.github.io.
Open Datasets Yes To train the ICS-A2M model, we use a large-scale lip-reading dataset, voxceleb2 (Chung et al., 2018), which consists of about 2,000 hours videos from 6,112 celebrities.
Dataset Splits Yes For training efficiency, as shown in Fig. 4(a), we adapt the model on a 180-second-long clip as the training data and use the lasting 10-second clip as the validation set.
Hardware Specification Yes For the SD-Hybrid adaptation, we trained the model on 1 Nvidia A100 GPU, with a batch size of 1 and total iterations of 2,000, requiring about 8 GB of GPU memory and 0.26 hours. Regarding the ICS-A2M model, we trained it on 4 Nvidia A100 GPUs, with a batch size of 20,000 mel frames per GPU.
Software Dependencies No The paper does not provide specific version numbers for software libraries or frameworks used in the experiments.
Experiment Setup Yes We set the learning rate to 0.001, λLPIPS = 0.2, λID = 0.1. We provide detailed hyper-parameter settings about the model configuration in Table 6.