reproducibilityindex.ai

GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis

Authors: Zhenhui Ye, Ziyue Jiang, Yi Ren, Jinglin Liu, Jinzheng He, Zhou Zhao

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that our method achieves more generalized and high-fidelity talking face generation compared to previous methods.
Researcher Affiliation	Collaboration	Zhenhui Ye1 , Ziyue Jiang1 , Yi Ren2, Jinglin Liu1, Jin Zheng He1, Zhou Zhao1 1School of Computer Science and Technology, Zhejiang University {zhenhuiye,jiangziyue,jinglinliu,jinzhenghe,zhaozhou}@zju.edu.cn 2Byte Dance ren.yi@bytedance.com
Pseudocode	No	The paper includes architectural diagrams and descriptions of the model components but does not provide any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Video samples and source code are available at https://geneface.github.io
Open Datasets	Yes	To learn robust audio-to-motion mapping, a large-scale lipreading corpus is needed. Hence we use LRS3-TED (Afouras et al., 2018) to train our variational generator and post-net 2. Additionally, a certain person s speaking video of a few minutes in length with an audio track is needed to learn a Ne RF-based person portrait renderer. To be specific, in order to compare with the state-of-the-art methods, we utilize the data set of Lu et al. (2021) and Guo et al. (2021), which consist of 5 videos of an average length of 6,000 frames in 25 fps.
Dataset Splits	No	The paper mentions using LRS3-TED and other datasets for training and testing, and uses the term 'test' in its evaluation. However, it does not provide explicit details about training, validation, and test splits (e.g., percentages or exact counts) for the datasets used in a reproducible manner.
Hardware Specification	Yes	We train the Gene Face on 1 NVIDIA RTX 3090 GPU, and the detailed training hyper-parameters of the variational generator, post-net, and Ne RF-based are listed in Appendix B.
Software Dependencies	No	The paper lists model configurations and hyperparameters in Appendix B.1, but it does not specify any software dependencies (e.g., libraries, frameworks, or operating systems) with version numbers.
Experiment Setup	Yes	Implementation Details. We train the Gene Face on 1 NVIDIA RTX 3090 GPU, and the detailed training hyper-parameters of the variational generator, post-net, and Ne RF-based are listed in Appendix B. For variational generator and post-net, it takes about 40k and 12k steps to converge (about 12 hours). For the Ne RF-based renderer, we train each model for 800k iterations (400k for head and 400k for the torso, respectively), which takes about 72 hours. Table 4: Hyper-parameter list