Talking Face Generation by Adversarially Disentangled Audio-Visual Representation
Authors: Hang Zhou, Yu Liu, Ziwei Liu, Ping Luo, Xiaogang Wang9299-9306
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that the proposed approach generates realistic talking face sequences on arbitrary subjects with much clearer lip motion patterns than previous work. |
| Researcher Affiliation | Academia | The Chinese University of Hong Kong, Hong Kong, China {zhouhang@link, yuliu@ee, zwliu@ie, xgwang@ee}.cuhk.edu.hk, pluo.lhi@gmail.com |
| Pseudocode | No | The paper describes the model architecture and training process in detail with figures and formulas, but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper links to a project page 'https://liuziwei7.github.io/projects/Talking Face' but this page does not explicitly provide access to the source code for the described methodology. |
| Open Datasets | Yes | Our model is trained and evaluated on the LRW dataset (Chung and Zisserman 2016a)... the identity-preserving module of the network is trained on a subset of the MS-Celeb-1M dataset (Guo et al. 2016). |
| Dataset Splits | Yes | For each class, there are more than 800 training samples and 50 validation/test samples. |
| Hardware Specification | Yes | The batch size is set to be 18 with 1e-4 learning rate and trained on 6 Titan X GPUs. |
| Software Dependencies | No | The paper states 'We implemented DAVS using Pytorch.' but does not specify the version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | The batch size is set to be 18 with 1e-4 learning rate and trained on 6 Titan X GPUs. It takes about 4 epochs for the audio-visual speech recognition and person-identity recognition to converge and another 5 epochs for further tuning the generator. |