Talking Face Generation by Adversarially Disentangled Audio-Visual Representation

Authors: Hang Zhou, Yu Liu, Ziwei Liu, Ping Luo, Xiaogang Wang9299-9306

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that the proposed approach generates realistic talking face sequences on arbitrary subjects with much clearer lip motion patterns than previous work.
Researcher Affiliation Academia The Chinese University of Hong Kong, Hong Kong, China {zhouhang@link, yuliu@ee, zwliu@ie, xgwang@ee}.cuhk.edu.hk, pluo.lhi@gmail.com
Pseudocode No The paper describes the model architecture and training process in detail with figures and formulas, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper links to a project page 'https://liuziwei7.github.io/projects/Talking Face' but this page does not explicitly provide access to the source code for the described methodology.
Open Datasets Yes Our model is trained and evaluated on the LRW dataset (Chung and Zisserman 2016a)... the identity-preserving module of the network is trained on a subset of the MS-Celeb-1M dataset (Guo et al. 2016).
Dataset Splits Yes For each class, there are more than 800 training samples and 50 validation/test samples.
Hardware Specification Yes The batch size is set to be 18 with 1e-4 learning rate and trained on 6 Titan X GPUs.
Software Dependencies No The paper states 'We implemented DAVS using Pytorch.' but does not specify the version number for PyTorch or any other software dependencies.
Experiment Setup Yes The batch size is set to be 18 with 1e-4 learning rate and trained on 6 Titan X GPUs. It takes about 4 epochs for the audio-visual speech recognition and person-identity recognition to converge and another 5 epochs for further tuning the generator.