reproducibilityindex.ai

Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion

Authors: Suzhen Wang, Lincheng Li, Yu Ding, Changjie Fan, Xin Yu

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that our method produces videos with plausible head motions, synchronized facial expressions, and stable backgrounds and outperforms the state-of-the-art.
Researcher Affiliation	Collaboration	Suzhen Wang1 , Lincheng Li1 , Yu Ding1 , Changjie Fan1 , Xin Yu2 1 Virtual Human Group, Netease Fuxi AI Lab, China 2University of Technology Sydney
Pseudocode	No	The paper contains mathematical formulations and architecture diagrams, but no explicit pseudocode or algorithm blocks.
Open Source Code	No	To ensure proper use, we will release our code and models to promote the progress in detecting fake videos.
Open Datasets	Yes	We use prevalent benchmark datasets Vox Celeb [Nagrani et al., 2017], GRID [Cooke et al., 2006] and LRW [Chung and Zisserman, 2016] to evaluate the proposed method.
Dataset Splits	Yes	We split each dataset into training and testing sets following the setting of previous works.
Hardware Specification	Yes	NH is trained on Vox Celeb for one day on one RTX 2080 Ti with batchsize 64. ... The training of ND and NI takes 3 days with batchsize 28, and that of NM takes one week with batchsize 4 on 4 RTX 2080 Ti.
Software Dependencies	No	All our networks are implemented using Py Torch.
Experiment Setup	Yes	We adopt Adam optimizer during training, with an initial learning rate of 2e-4 and weight decay to 2e-6. ... NH is trained on Vox Celeb for one day on one RTX 2080 Ti with batchsize 64. ... The training of ND and NI takes 3 days with batchsize 28, and that of NM takes one week with batchsize 4 on 4 RTX 2080 Ti.