FG-EmoTalk: Talking Head Video Generation with Fine-Grained Controllable Facial Expressions

Authors: Zhaoxu Sun, Yuze Xuan, Fang Liu, Yang Xiang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show our method achieves fine-grained expression control, produces high-quality talking head videos and outperforms baseline methods.
Researcher Affiliation Collaboration Zhaoxu Sun1, Yuze Xuan1, Fang Liu2*, Yang Xiang1 1Xiaobing.ai 2State Key Laboratory of Media Convergence and Communication, Communication University of China
Pseudocode No The paper describes its method in text and with diagrams (Figure 2), but does not provide a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code No The paper does not provide an explicit statement or link for the open-source code for the methodology described.
Open Datasets Yes We use the HDTF (Zhang et al. 2021b) and Celeb V-HQ (Zhu et al. 2022) datasets... Moreover, the MEAD dataset (Wang et al. 2020)... We used the DISFA dataset (Mavadati et al. 2013)...
Dataset Splits No The paper mentions selecting 2,000 videos from HDTF not in training set for evaluation and 2,000 videos from MEAD for testing. It does not explicitly specify a validation set or clear percentages for training, validation, and test splits for all datasets, nor the training set size.
Hardware Specification Yes All experiments were conducted with 4 NVIDIA Tesla A10 GPUs.
Software Dependencies No The paper states 'We implemented our framework in Pytorch' but does not provide specific version numbers for PyTorch or any other software dependencies like Wav2Vec2, Gated-GCN, or GFP-GAN.
Experiment Setup Yes We used the Adam optimizer with a learning rate of 0.002. The hyperparameters λapp, λexp, and λper were set to 100.0, 100.0, and 10.0 respectively in the training stage.