Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

EcoFace: Audio-Visual Emotional Co-Disentanglement Speech-Driven 3D Talking Face Generation

Authors: Jiajian Xie, Shengyu Zhang, Mengze Li, chengfei lv, Zhou Zhao, Fei Wu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that our method achieves more generalized and emotionally realistic talking face generation compared to previous methods. [...] 4 EXPERIMENT
Researcher Affiliation Collaboration 1Zhejiang University 2Alibaba EMAIL EMAIL
Pseudocode No The paper describes the methodology in prose and figures, but does not contain explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Video samples and source code are available at https://ecoface1.github.io/
Open Datasets Yes To train our EDE, we use an emotional talking face video dataset, RAVDESS (Livingstone & Russo, 2018), which contains 1440 video clips of different actors speaking with 8 emotion categories. [...] videos from the HDTF (Zhang et al., 2021) [...] VOCASET (Cudeiro et al., 2019) and MEAD (Wang et al., 2020) datasets will be used for the evaluation.
Dataset Splits Yes A random selection of 80% of these datasets was used for training, 10% for validation, and 10% for testing.
Hardware Specification Yes All experiments are performed on a single NVIDIA RTX 3090 GPU.
Software Dependencies No The paper does not provide specific version numbers for software dependencies such as libraries or programming languages.
Experiment Setup Yes We employ the Adam Optimizer across all modules. The EDE is trained for 10,000 iterations, with the batch size set to 30. This training takes about 1 hour, using a learning rate of 5 * 10^-5. Furthermore, we use 30,000 iterations with a batch size of 50 and a learning rate of 5 * 10^-5, which took about 20 hours to train our EMG.