Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation
Authors: Lincheng Li, Suzhen Wang, Zhimeng Zhang, Yu Ding, Yixing Zheng, Xin Yu, Changjie Fan1911-1920
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on qualitative and quantitative results demonstrate that our algorithm achieves high-quality photorealistic talking-head videos including various facial expressions and head motions according to speech rhythms and outperforms the state-of-the-art. |
| Researcher Affiliation | Collaboration | 1 Netease Fuxi AI Lab 2 University of Technology Sydney {lilincheng, wangsuzhen, zhangzhimeng, dingyu01, zhengyixing01, fanchangjie}@corp.netease.com xin.yu@uts.edu.au |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Both datasets are released for research purposes3. 3https://github.com/FuxiVirtualHuman/Write-a-Speaker |
| Open Datasets | Yes | Both datasets are released for research purposes3. 3https://github.com/FuxiVirtualHuman/Write-a-Speaker |
| Dataset Splits | No | The paper mentions using a 'groundtruth test data' from the Mocap dataset but does not specify the training, validation, or test split percentages or sample counts. |
| Hardware Specification | Yes | We implement the system using Py Torch on a single GTX 2080Ti. |
| Software Dependencies | No | The paper mentions 'Py Torch' and 'Adam' optimizer but does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | The loss weights are set to λmou = 50, λupp = 100, α = 10, β = 100, and γ = 100. We use the Adam (Kingma and Ba 2014) optimizer for all networks. For training Gmou, Gupp and Ghed, we set β1 = 0.5, β2 = 0.99, ϵ = 10 8, batch size of 32, and set the initial learning rate as 0.0005 for the generators and 0.00001 for the discriminators. The learning rates of Gmou stay fixed in the first 400 epoches and linearly decay to zero within another 400 epoches. The learning rates of Gupp and Ghed keep unchanged in the first 50 epoches and linearly decay to zero within another 50 epoches. For training Gvid, we set β1 = 0.5, β2 = 0.999, ϵ = 10 8, batch size of 3, and initial learning rate of 0.0002 with linear decay to 0.0001 within 50 epochs. |