Loss Function Search for Face Recognition

Authors: Xiaobo Wang, Shuo Wang, Cheng Chi, Shifeng Zhang, Tao Mei

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on the face recognition benchmarks, including LFW, SLLFW, CALFW, CPLFW, Age DB, CFP, RFW, Mega Face and Trillion Pairs, which have verified the superiority of our new approach over the baseline Softmax loss, the handcrafted heuristic margin-based Softmax losses, and the Auto ML method AM-LFS.
Researcher Affiliation Collaboration 1JD AI Research 2Institute of Automation, Chinese Academy of Science. Correspondence to: Shifeng Zhang <shifeng.zhang@nlpr.ia.ac.cn>.
Pseudocode Yes Algorithm 1 Search-Softmax
Open Source Code Yes To allow more experimental verification, our code is available at http://www.cbsr.ia.ac.cn/users/xiaobowang/.
Open Datasets Yes This paper involves two popular training datasets, including CASIA-Web Face (Yi et al., 2014) and MS-Celeb-1M (Guo et al., 2016).
Dataset Splits Yes In the outer level, we optimize the modulating factor a by REINFORCE (Williams, 1992) with rewards (i.e., accuracy on LFW) from a fixed number of sampled models.
Hardware Specification Yes For all the datasets, each sampled model is trained with 2 P40 GPUs, so a total of 8 GPUs are used.
Software Dependencies No The paper states: 'All experiments in this paper are implemented by Pytorch (Paszke et al., 2019).' However, a specific version number for Pytorch is not provided.
Experiment Setup Yes The total batch size is 128. The weight decay is set to 0.0005 and the momentum is 0.9. The learning rate is initially 0.1. For the CASIA-Web Face-R, we empirically divide the learning rate by 10 at 9, 18, 26 epochs and finish the training process at 30 epoch. For the MS-Celeb-1M-v1c-R, we divide the learning rate by 10 at 4, 8, 10 epochs, and finish the training process at 12 epoch. ... We use Adam optimizer with a learning rate of η = 0.05 and set σ = 0.2 for updating the distribution parameter µ.