reproducibilityindex.ai

Towards Highly Accurate and Stable Face Alignment for High-Resolution Videos

Authors: Ying Tai, Yicong Liang, Xiaoming Liu, Lei Duan, Jilin Li, Chengjie Wang, Feiyue Huang, Yu Chen8893-8900

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on 300W, 300VW and Talking Face datasets clearly demonstrate that the proposed method is more accurate and stable than the state-of-the-art models. We conduct extensive experiments on both image and video-based alignment datasets, including 300W (Sagonas et al. 2013), 300-VW (Shen et al. 2017) and Talking Face (TF) (FGNET 2014).
Researcher Affiliation	Collaboration	Youtu Lab, Tencent, Michigan State University Fudan University, Nanjing University of Science and Technology
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	https://github.com/tyshiwo/FHR alignment
Open Datasets	Yes	We conduct extensive experiments on both image and video-based alignment datasets, including 300W (Sagonas et al. 2013), 300-VW (Shen et al. 2017) and Talking Face (TF) (FGNET 2014).
Dataset Splits	No	The paper specifies training and testing sets, but does not explicitly provide details about a separate validation split with percentages or sample counts.
Hardware Specification	Yes	Training our FHR on 300W takes 7 hours on a P100 GPU.
Software Dependencies	Yes	We train the network with the Torch7 toolbox (Collobert, Kavukcuoglu, and Farabet 2011), using the RMSprop algorithm with an initial learning rate of 2.5 × 10−4, a minibatch size of 6 and σ = 3.
Experiment Setup	Yes	We train the network with the Torch7 toolbox (Collobert, Kavukcuoglu, and Farabet 2011), using the RMSprop algorithm with an initial learning rate of 2.5 × 10−4, a minibatch size of 6 and σ = 3. During the stabilization training, we set λ1 = λ3 = 1 and λ2 = 10 to make all terms in the stabilization loss (11) on the same order of magnitude. We estimate the average variance ρ of z(t)i − p(t)i across all training videos and all landmarks, and empirically set the initial value of Γnoise as ρI. Also, we initialize Γ1 as a zero matrix O2M×2M, Γ2 as 10ρI, and γ = β1 = β2 = 0.5.