Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ProsodyTalker: 3D Visual Speech Animation via Prosody Decomposition

Authors: Zonglin Li, Xiaoqian Lv, Qinglin Liu, Quanling Meng, Xin Sun, Shengping Zhang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that our method achieves more realistic animation than state-of-the-art methods. ... Quantitative Evaluation To quantitatively evaluate lip synchronization, we employ two established metrics (e.g., lip vertex error (LVE) and upper face dynamics deviation (FDD)) and report the average across all sequences. ... Ablation Study Impact of the Perturbation in Content Encoding. ... The group without head poses includes Face Former, Code Talker, Self Talker, our method in zero-posed, and the ground truth.
Researcher Affiliation Academia Harbin Institute of Technology EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes its methodology in detail using text and figures, but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement about releasing source code, nor does it provide a link to a code repository.
Open Datasets Yes Existing 4D face datasets (e.g., VOCASET (Cudeiro et al. 2019) and BIWI (Fanelli et al. 2010)) generally lack head pose variations... Concurrently, 2D talking face datasets such as MEAD (Wang et al. 2020) and HDTF (Zhang et al. 2021b) only provide the audio and video files, without incorporating 3D facial parameters.
Dataset Splits No For the VOCASET-Test and 3DTH-Test datasets, we select 20 samples from each dataset for each of the 7 comparison types.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types) used for running experiments.
Software Dependencies No The paper refers to various existing models and methods (e.g., EMOCA-v2, NANSY, Face Former) and mentions general concepts like convolution layers and transformers, but it does not specify versions of programming languages, libraries, or frameworks used for its implementation.
Experiment Setup No The paper describes the model architecture, loss functions, and general training approaches, but it does not provide specific hyperparameters such as learning rates, batch sizes, number of epochs, or optimizer configurations.