reproducibilityindex.ai

What Does Your Face Sound Like? 3D Face Shape towards Voice

Authors: Zhihan Yang, Zhiyong Wu, Ying Shan, Jia Jia

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments and subjective tests demonstrate our method can generate utterances matching faces well, with good audio quality and voice diversity. We also explore and visualize how the voice changes with the face. Case studies show that our method upgrades the face-voice inference to personalized custom-made voice creating, revealing a promising prospect in virtual human and dubbing applications.
Researcher Affiliation	Collaboration	Zhihan Yang1, Zhiyong Wu1 , Ying Shan2, Jia Jia3* 1Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China 2Applied Research Center (ARC), Tencent PCG, Shenzhen 518054, China 3Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
Pseudocode	No	The paper describes the proposed framework and methodology in narrative text and diagrams (Figure 1) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions using and links to third-party open-source tools (Kaldi, Face-detection-feature-extraction, ESPnet) but does not state that the code for their specific proposed framework or methodology is open-source or provide a link for it.
Open Datasets	Yes	The first part of the dataset comes from Vox Celeb2 (Chung, Nagrani, and Zisserman 2018) and VGGFace2 (Cao et al. 2018a). Additionally, we utilize Cha Learn LAP (Ponce-L opez et al. 2016) video dataset, containing videos of more than 30,000 clips.
Dataset Splits	Yes	We split the first 1,200 and 525 speakers off Vox Celeb2 and Cha Learn LAP for validation, remaining 4,795 and 2,009 speakers for training respectively.
Hardware Specification	Yes	We train our model on an NVIDIA Geforce 2080 Ti for 50 epochs, with a batch size of 64.
Software Dependencies	No	The paper mentions several software components like 'Kaldi', 'VGG-19 model', and 'Conformer-Fast Speech2', but it does not provide specific version numbers for these software dependencies or the frameworks used (e.g., PyTorch, TensorFlow).
Experiment Setup	Yes	We train our model on an NVIDIA Geforce 2080 Ti for 50 epochs, with a batch size of 64. We adopt the Adam optimizer with a learning rate of 0.002.