reproducibilityindex.ai

Softmax Dissection: Towards Understanding Intra- and Inter-Class Objective for Embedding Learning

Authors: Lanqing He, Zhongdao Wang, Yali Li, Shengjin Wang10957-10964

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The face recognition experiments on regularscale data show D-Softmax is favorably comparable to existing losses such as Sphere Face and Arc Face. Experiments on massive-scale data show the fast variants signiﬁcantly accelerates the training process (such as 64 ) with only a minor sacriﬁce in performance, outperforming existing acceleration methods of Softmax in terms of both performance and efﬁciency. Evaluation.We validate the effectiveness of the proposed DSoftmax in the face recognition task. The testing datasets include LFW (Huang et al. 2008), CFP-FP (Sengupta et al. 2016), Age DB-30 (Moschoglou et al. 2017), IJB-C (Maze et al. 2018) and Mega Face (Kemelmacher-Shlizerman et al. 2016). LFW is a standard face veriﬁcation benchmark that includes 6,000 pairs of faces, and the evaluation metric is the veriﬁcation accuracy via 10-fold cross validation. CFP-FP and Age DB-30 are similar to LFW but emphasis on frontalproﬁle and cross-age face veriﬁcation respectively. IJB-C is a large-scale benchmark for template-based face recognition. A face template is composed of multiple face images or video face tracks. Features are simply average pooled in a template to obtain the template feature. The evaluation metric is the true accept rate (TAR) at different false alarm rate (FAR). Mega Face identiﬁcation challenge is a large-scale benchmark to evaluate the performance at the million distractors. We perform the rank-1 identiﬁcation accuracy with 106 distractors on the a reﬁned version used by Arc Face1.
Researcher Affiliation	Academia	Lanqing He,* Zhongdao Wang, Yali Li, Shengjin Wang Department of Electronic Engineering, Tsinghua University {hlq17, wcd17}@mails.tsinghua.edu.cn, liyali13@mail.tsinghua.edu.cn, wgsgj@tsinghua.edu.cn
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not explicitly provide a link or statement about the release of its own source code.
Open Datasets	Yes	Training. We adopt the MS-Celeb-1M (Guo et al. 2016) dataset for training. Since the original MS-Celeb-1M contains wrong annotations, we adopt a cleaned version that is also used in Arc Face. The cleaned MS-Celeb-1M consists of around 5.8M images of 85K identities. Moreover, to validate the effectiveness and efﬁciency of the proposed losses on massive-scale data, we combine MS-Celeb-1M with the Mega Face2 (Nech and Kemelmacher-Shlizerman 2017) dataset to obtain a large training set. The Mega Face2 dataset consists of 4.7M images of 672K identities, so the joint dataset has 9.5M images of 757K identities in total.
Dataset Splits	No	The paper mentions using MS-Celeb-1M for training and various datasets for testing (LFW, CFP-FP, Age DB-30, IJB-C, Mega Face), including 10-fold cross-validation for LFW. However, it does not explicitly specify a train/validation/test split for the primary training dataset (MS-Celeb-1M) or for the combined dataset.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments, only general mentions like 'GPU memory' or 'CPU Ram'.
Software Dependencies	No	The paper mentions 'standard Res Net-50' as the model backbone but does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA, specific libraries).
Experiment Setup	Yes	All the models are standard Res Net-50 (He et al. 2016), trained on MS-Celeb-1M. We set the scale s = 32, the margin m1 = 4 for Sphere Face, m2 = 0.5 for Arc Face for the best performance. The other hyperparameters are the same. Selection of d. By tuning the hyperparameter ϵ in LD, we are able to set the optimal d. Table 1 shows performance of LD with different settings of d. we train several Res Net50 with batch size of 256, and employ D-Softmax-B as the objective, with sampling rates varying from 1 to 1/256.