Towards Highly Accurate and Stable Face Alignment for High-Resolution Videos
Authors: Ying Tai, Yicong Liang, Xiaoming Liu, Lei Duan, Jilin Li, Chengjie Wang, Feiyue Huang, Yu Chen8893-8900
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on 300W, 300VW and Talking Face datasets clearly demonstrate that the proposed method is more accurate and stable than the state-of-the-art models. We conduct extensive experiments on both image and video-based alignment datasets, including 300W (Sagonas et al. 2013), 300-VW (Shen et al. 2017) and Talking Face (TF) (FGNET 2014). |
| Researcher Affiliation | Collaboration | Youtu Lab, Tencent, Michigan State University Fudan University, Nanjing University of Science and Technology |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | https://github.com/tyshiwo/FHR alignment |
| Open Datasets | Yes | We conduct extensive experiments on both image and video-based alignment datasets, including 300W (Sagonas et al. 2013), 300-VW (Shen et al. 2017) and Talking Face (TF) (FGNET 2014). |
| Dataset Splits | No | The paper specifies training and testing sets, but does not explicitly provide details about a separate validation split with percentages or sample counts. |
| Hardware Specification | Yes | Training our FHR on 300W takes 7 hours on a P100 GPU. |
| Software Dependencies | Yes | We train the network with the Torch7 toolbox (Collobert, Kavukcuoglu, and Farabet 2011), using the RMSprop algorithm with an initial learning rate of 2.5 × 10−4, a minibatch size of 6 and σ = 3. |
| Experiment Setup | Yes | We train the network with the Torch7 toolbox (Collobert, Kavukcuoglu, and Farabet 2011), using the RMSprop algorithm with an initial learning rate of 2.5 × 10−4, a minibatch size of 6 and σ = 3. During the stabilization training, we set λ1 = λ3 = 1 and λ2 = 10 to make all terms in the stabilization loss (11) on the same order of magnitude. We estimate the average variance ρ of z(t)i − p(t)i across all training videos and all landmarks, and empirically set the initial value of Γnoise as ρI. Also, we initialize Γ1 as a zero matrix O2M×2M, Γ2 as 10ρI, and γ = β1 = β2 = 0.5. |