Towards Omni-Supervised Face Alignment for Large Scale Unlabeled Videos

Authors: Congcong Zhu, Hao Liu*(corresponding author), Zhenhua Yu, Xuehong Sun13090-13097

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results demonstrate that our approach surpasses the performance of most fully supervised state-of-the-arts. To justify the effectiveness of the proposed STRRN, we represent folds of experimental results and analysis based on three downloaded large scale video datasets.
Researcher Affiliation Academia 1School of Computer Engineering and Science, Shanghai University, Shanghai, 200444, China 2School of Information Engineering, Ningxia University, Yinchuan, 750021, China 3Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Yinchuan, 750021, China
Pseudocode Yes Algorithm 1: Training Procedure of Our STRRN
Open Source Code No The paper states 'More details will be made publicly in our release model and source code.' which indicates a future release, not concrete access to source code at the time of publication.
Open Datasets Yes Evaluation Datasets: 300VW (Shen et al. 2015): The 300 Videos in the Wild (300VW) Dataset was collected specific for video-based face alignment. You Tube-Face (Wolf, Hassner, and Maoz 2011) and You Tube-Celebrities(Kim et al. 2008): We also leveraged two large scale unlabeled video datasets including You Tube Face (Wolf, Hassner, and Maoz 2011) and You Tube-Celebrities(Kim et al. 2008).
Dataset Splits No For the 300VW dataset, the paper states 'we utilized 50 sequences for training and the remaining 64 sequences were used for testing,' but it does not specify a distinct validation split or cross-validation setup.
Hardware Specification Yes The whole training procedure processes at about 60ms each frame with a GPU of single NVIDIA GTX 1080 Ti graphic computation card (11G memory). Excluding the time of the face detection part, our model runs at 30 frames per second on one CPU with the Intel(R) Core(TM) i5-6500 CPU@3.20GHz and requires around 2G memory usage for runtime data loading.
Software Dependencies No The paper mentions 'Tensorflow' but does not provide specific version numbers for it or any other key software dependencies.
Experiment Setup Yes For hyper-parameters in our STRRN, we empirically set the discounted factor λ to 0.4 and the thresholding T to the normalized RMSE 0.02 during generating extra training annotations.