Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation
Authors: Seyeon Kim, Siyoon Jin, Jihye Park, Kihong Kim, Jiyoung Kim, Jisu Nam, Seungryong Kim
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on standard benchmarks demonstrate that our model outperforms existing GAN-based and diffusion-based models. We also provide comprehensive ablation studies and user study results. In experiments, our framework achieves state-of-the-art performance on HDTF dataset (Zhang et al. 2021), surpassing GAN-based (Prajwal et al. 2020; Zhou et al. 2021) and diffusion-based (Ma et al. 2023; Wei, Yang, and Wang 2024) approaches. |
| Researcher Affiliation | Collaboration | Seyeon Kim 1, 2*, Siyoon Jin 1*, Jihye Park 1, 2*, Kihong Kim 3, Jiyoung Kim 1, Jisu Nam 4, Seungryong Kim 4 1Korea University 2Samsung Electronics 3VIVE STUDIOS 4KAIST |
| Pseudocode | No | The paper describes its methodology in prose and mathematical formulations but does not include any distinct pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://github.com/cvlab-kaist/Mo Di Talker |
| Open Datasets | Yes | We used the LRS3-TED (Afouras, Chung, and Zisserman 2018) and HDTF (Zhang et al. 2021) datasets to train our ATo M and MTo V models, respectively. |
| Dataset Splits | Yes | For MTo V, we randomly selected 312 videos from the HDTF dataset for training, using remaining 98 videos for testing. |
| Hardware Specification | Yes | For all experiments, we used single NVIDIA RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions software components like Hu BERT and 3DMM but does not provide specific version numbers for these or other key software dependencies required for replication. |
| Experiment Setup | Yes | For ATo M, we train the model for 300k iterations with a learning rate of 1e-4. For MTo V, we train the model for 600k iterations with a learning rate of 1e-4. To alleviate jittering, we employed a blending technique using Gaussian blur, as described in (Chen et al. 2020). Additional implementation details are provided in the Appendix 1. |