Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
ProsodyTalker: 3D Visual Speech Animation via Prosody Decomposition
Authors: Zonglin Li, Xiaoqian Lv, Qinglin Liu, Quanling Meng, Xin Sun, Shengping Zhang
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that our method achieves more realistic animation than state-of-the-art methods. ... Quantitative Evaluation To quantitatively evaluate lip synchronization, we employ two established metrics (e.g., lip vertex error (LVE) and upper face dynamics deviation (FDD)) and report the average across all sequences. ... Ablation Study Impact of the Perturbation in Content Encoding. ... The group without head poses includes Face Former, Code Talker, Self Talker, our method in zero-posed, and the ground truth. |
| Researcher Affiliation | Academia | Harbin Institute of Technology EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes its methodology in detail using text and figures, but it does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code, nor does it provide a link to a code repository. |
| Open Datasets | Yes | Existing 4D face datasets (e.g., VOCASET (Cudeiro et al. 2019) and BIWI (Fanelli et al. 2010)) generally lack head pose variations... Concurrently, 2D talking face datasets such as MEAD (Wang et al. 2020) and HDTF (Zhang et al. 2021b) only provide the audio and video files, without incorporating 3D facial parameters. |
| Dataset Splits | No | For the VOCASET-Test and 3DTH-Test datasets, we select 20 samples from each dataset for each of the 7 comparison types. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types) used for running experiments. |
| Software Dependencies | No | The paper refers to various existing models and methods (e.g., EMOCA-v2, NANSY, Face Former) and mentions general concepts like convolution layers and transformers, but it does not specify versions of programming languages, libraries, or frameworks used for its implementation. |
| Experiment Setup | No | The paper describes the model architecture, loss functions, and general training approaches, but it does not provide specific hyperparameters such as learning rates, batch sizes, number of epochs, or optimizer configurations. |