Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
EcoFace: Audio-Visual Emotional Co-Disentanglement Speech-Driven 3D Talking Face Generation
Authors: Jiajian Xie, Shengyu Zhang, Mengze Li, chengfei lv, Zhou Zhao, Fei Wu
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that our method achieves more generalized and emotionally realistic talking face generation compared to previous methods. [...] 4 EXPERIMENT |
| Researcher Affiliation | Collaboration | 1Zhejiang University 2Alibaba EMAIL EMAIL |
| Pseudocode | No | The paper describes the methodology in prose and figures, but does not contain explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Video samples and source code are available at https://ecoface1.github.io/ |
| Open Datasets | Yes | To train our EDE, we use an emotional talking face video dataset, RAVDESS (Livingstone & Russo, 2018), which contains 1440 video clips of different actors speaking with 8 emotion categories. [...] videos from the HDTF (Zhang et al., 2021) [...] VOCASET (Cudeiro et al., 2019) and MEAD (Wang et al., 2020) datasets will be used for the evaluation. |
| Dataset Splits | Yes | A random selection of 80% of these datasets was used for training, 10% for validation, and 10% for testing. |
| Hardware Specification | Yes | All experiments are performed on a single NVIDIA RTX 3090 GPU. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies such as libraries or programming languages. |
| Experiment Setup | Yes | We employ the Adam Optimizer across all modules. The EDE is trained for 10,000 iterations, with the batch size set to 30. This training takes about 1 hour, using a learning rate of 5 * 10^-5. Furthermore, we use 30,000 iterations with a batch size of 50 and a learning rate of 5 * 10^-5, which took about 20 hours to train our EMG. |