reproducibilityindex.ai

Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation

Authors: Lincheng Li, Suzhen Wang, Zhimeng Zhang, Yu Ding, Yixing Zheng, Xin Yu, Changjie Fan1911-1920

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on qualitative and quantitative results demonstrate that our algorithm achieves high-quality photorealistic talking-head videos including various facial expressions and head motions according to speech rhythms and outperforms the state-of-the-art.
Researcher Affiliation	Collaboration	1 Netease Fuxi AI Lab 2 University of Technology Sydney {lilincheng, wangsuzhen, zhangzhimeng, dingyu01, zhengyixing01, fanchangjie}@corp.netease.com xin.yu@uts.edu.au
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	Both datasets are released for research purposes3. 3https://github.com/FuxiVirtualHuman/Write-a-Speaker
Open Datasets	Yes	Both datasets are released for research purposes3. 3https://github.com/FuxiVirtualHuman/Write-a-Speaker
Dataset Splits	No	The paper mentions using a 'groundtruth test data' from the Mocap dataset but does not specify the training, validation, or test split percentages or sample counts.
Hardware Specification	Yes	We implement the system using Py Torch on a single GTX 2080Ti.
Software Dependencies	No	The paper mentions 'Py Torch' and 'Adam' optimizer but does not specify version numbers for any software dependencies.
Experiment Setup	Yes	The loss weights are set to λmou = 50, λupp = 100, α = 10, β = 100, and γ = 100. We use the Adam (Kingma and Ba 2014) optimizer for all networks. For training Gmou, Gupp and Ghed, we set β1 = 0.5, β2 = 0.99, ϵ = 10 8, batch size of 32, and set the initial learning rate as 0.0005 for the generators and 0.00001 for the discriminators. The learning rates of Gmou stay ﬁxed in the ﬁrst 400 epoches and linearly decay to zero within another 400 epoches. The learning rates of Gupp and Ghed keep unchanged in the ﬁrst 50 epoches and linearly decay to zero within another 50 epoches. For training Gvid, we set β1 = 0.5, β2 = 0.999, ϵ = 10 8, batch size of 3, and initial learning rate of 0.0002 with linear decay to 0.0001 within 50 epochs.