Softmax Dissection: Towards Understanding Intra- and Inter-Class Objective for Embedding Learning
Authors: Lanqing He, Zhongdao Wang, Yali Li, Shengjin Wang10957-10964
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The face recognition experiments on regularscale data show D-Softmax is favorably comparable to existing losses such as Sphere Face and Arc Face. Experiments on massive-scale data show the fast variants significantly accelerates the training process (such as 64 ) with only a minor sacrifice in performance, outperforming existing acceleration methods of Softmax in terms of both performance and efficiency. Evaluation.We validate the effectiveness of the proposed DSoftmax in the face recognition task. The testing datasets include LFW (Huang et al. 2008), CFP-FP (Sengupta et al. 2016), Age DB-30 (Moschoglou et al. 2017), IJB-C (Maze et al. 2018) and Mega Face (Kemelmacher-Shlizerman et al. 2016). LFW is a standard face verification benchmark that includes 6,000 pairs of faces, and the evaluation metric is the verification accuracy via 10-fold cross validation. CFP-FP and Age DB-30 are similar to LFW but emphasis on frontalprofile and cross-age face verification respectively. IJB-C is a large-scale benchmark for template-based face recognition. A face template is composed of multiple face images or video face tracks. Features are simply average pooled in a template to obtain the template feature. The evaluation metric is the true accept rate (TAR) at different false alarm rate (FAR). Mega Face identification challenge is a large-scale benchmark to evaluate the performance at the million distractors. We perform the rank-1 identification accuracy with 106 distractors on the a refined version used by Arc Face1. |
| Researcher Affiliation | Academia | Lanqing He,* Zhongdao Wang, Yali Li, Shengjin Wang Department of Electronic Engineering, Tsinghua University {hlq17, wcd17}@mails.tsinghua.edu.cn, liyali13@mail.tsinghua.edu.cn, wgsgj@tsinghua.edu.cn |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not explicitly provide a link or statement about the release of its own source code. |
| Open Datasets | Yes | Training. We adopt the MS-Celeb-1M (Guo et al. 2016) dataset for training. Since the original MS-Celeb-1M contains wrong annotations, we adopt a cleaned version that is also used in Arc Face. The cleaned MS-Celeb-1M consists of around 5.8M images of 85K identities. Moreover, to validate the effectiveness and efficiency of the proposed losses on massive-scale data, we combine MS-Celeb-1M with the Mega Face2 (Nech and Kemelmacher-Shlizerman 2017) dataset to obtain a large training set. The Mega Face2 dataset consists of 4.7M images of 672K identities, so the joint dataset has 9.5M images of 757K identities in total. |
| Dataset Splits | No | The paper mentions using MS-Celeb-1M for training and various datasets for testing (LFW, CFP-FP, Age DB-30, IJB-C, Mega Face), including 10-fold cross-validation for LFW. However, it does not explicitly specify a train/validation/test split for the primary training dataset (MS-Celeb-1M) or for the combined dataset. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments, only general mentions like 'GPU memory' or 'CPU Ram'. |
| Software Dependencies | No | The paper mentions 'standard Res Net-50' as the model backbone but does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA, specific libraries). |
| Experiment Setup | Yes | All the models are standard Res Net-50 (He et al. 2016), trained on MS-Celeb-1M. We set the scale s = 32, the margin m1 = 4 for Sphere Face, m2 = 0.5 for Arc Face for the best performance. The other hyperparameters are the same. Selection of d. By tuning the hyperparameter ϵ in LD, we are able to set the optimal d. Table 1 shows performance of LD with different settings of d. we train several Res Net50 with batch size of 256, and employ D-Softmax-B as the objective, with sampling rates varying from 1 to 1/256. |