Teacher Guided Neural Architecture Search for Face Recognition

Authors: Xiaobo Wang2817-2825

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on a variety of face recognition benchmarks have demonstrated the superiority of our method over the state-of-the-art alternatives.
Researcher Affiliation Collaboration 1Sangfor Technologies Inc., Shenzhen, China 2CBSR & NLPR, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Pseudocode Yes Algorithm 1: Teacher Guided Neural Architecture Search (TNAS)
Open Source Code No The paper mentions that the script for the teacher network is publicly available, and that experiments are implemented using Pytorch. However, it does not provide a link or an explicit statement about the availability of the source code for their proposed method (TNAS).
Open Datasets Yes This paper involves two popular training datasets, including CASIA-Web Face (Yi et al. 2014) and MS-Celeb-1M (Guo et al. 2016). ... We use nine face recognition benchmarks, including LFW (Huang, Ramesh, and Miller. 2007), SLLFW (Deng et al. 2017), CALFW (Zheng et al. 2017), CPLFW (Zheng et al. 2018), Age DB (Moschoglou et al. 2017), CFP (Sengupta et al. 2016), RFW (Wang et al. 2018c), Mega Face (Nech and Kemelmacher-Shlizerman 2017) and Trillion-Pairs1...
Dataset Splits Yes We adopt the CASIA-Web Face-R as the training set and the LFW as the validation set to search student networks.
Hardware Specification No The paper mentions that experiments are implemented using Pytorch and discusses deployment on 'mobile and embedded devices' but does not specify any hardware details like GPU models, CPU types, or memory used for running the experiments.
Software Dependencies No The paper states: 'All experiments in this paper are implemented by Pytorch library (Paszke et al. 2019).' While PyTorch is mentioned, a specific version number is not provided, making it not fully reproducible regarding software dependencies.
Experiment Setup Yes For searching the expected student networks, we sample the number of channels over {0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0}... We optimize the weights via SGD and the architecture parameters via Adam. For the weights, we start the learning rate from 0.1 and reduce it by the cosine scheduler... For the architecture parameters, we use the constant learning rate of 0.001 and a weight decay of 0.001. The toleration ratio t is always set as 5%. The softmax temperature τ in Eq. (5) is linearly decayed from 10 to 0.1. We adopt the CASIA-Web Face-R as the training set and the LFW as the validation set to search student networks. For training the searched student networks, all of them are trained from scratch by SGD algorithm, with the batch size 256. The weight decay is set to 0.0005 and the momentum is 0.9. The learning rate is initially 0.1. On CASIA-Web Face-R dataset, we empirically divide the learning rate by 10 at 9, 18, 26 epochs and finish the training process at 30 epochs. On MS-Celeb-1M-v1c-R dataset, we divide the learning rate by 10 at 4, 8, 10 epochs, and finish the training process at 12 epochs.