Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis

Authors: Zhenhui Ye, Tianyun Zhong, Yi Ren, Jiaqi Yang, Weichuang Li, Jiawei Huang, Ziyue Jiang, Jinzheng He, Rongjie Huang, Jinglin Liu, Chen Zhang, Xiang Yin, Zejun MA, Zhou Zhao

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that Real3D-Portrait generalizes well to unseen identities and generates more realistic talking portrait videos compared to previous methods1. 4 EXPERIMENT
Researcher Affiliation Collaboration Zhejiang University & Byte Dance & HKUST(GZ)
Pseudocode No The paper describes network structures and training processes but does not include explicit pseudocode blocks or algorithm listings.
Open Source Code No We provide detailed configuration and hyper-parameters in Appendix C, and will release the source code at https://real3dportrait.github.io in the future.
Open Datasets Yes To train the motion adapter and HTB-SR model, we use a high-fidelity talking face video dataset, Celeb V-HQ (Zhu et al., 2022), which is about 65 hours and contains 35,666 video clips with a resolution of 512 512 involving 15,653 identities. To train the A2M model, we use Vox Celeb2 (Chung et al., 2018), a low-fidelity but 2,000-hour-long large-scale lip-reading dataset to guarantee the generalizability of the audio-to-motion mapping.
Dataset Splits Yes for the Same-Identity Reenactment, we randomly chose 100 videos in our preserved validation split of Celeb V-HQ.
Hardware Specification Yes All training processes of Real3D-Portrait are performed on 8 NVIDIA A100 GPUs.
Software Dependencies No No specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) are mentioned.
Experiment Setup Yes We provide detailed configuration and hyper-parameters in Appendix C, and will release the source code at https://real3dportrait.github.io in the future.