Grouped Knowledge Distillation for Deep Face Recognition
Authors: Weisong Zhao, Xiangyu Zhu, Kaiwen Guo, Xiao-Yu Zhang, Zhen Lei
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results on popular face recognition benchmarks demonstrate the superiority of proposed GKD over state-of-the-art methods. In this section, we first conduct the ablation experiments on Primary-KD, Secondary-KD, and Binary-KD. Then, we explore the effects of different cumulative probability threshold τ and that of hyper-parameters λ1 and λ2. |
| Researcher Affiliation | Academia | Weisong Zhao 1,3*, Xiangyu Zhu 2,4*, Kaiwen Guo 2, Xiao-Yu Zhang 1,3 , Zhen Lei 2,4,5 1Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China 2 CBSR&NLPR, Institute of Automation, Chinese Academy of Sciences, Beijing, China 3 School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China 4 School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China 5 Centre for Artificial Intelligence and Robotics, Hong Kong Institute of Science & Innovation, Chinese Academy of Sciences, Hong Kong, China |
| Pseudocode | No | The information is insufficient. The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The information is insufficient. The paper does not provide any explicit statement about releasing source code for the described methodology, nor does it include a link to a code repository. |
| Open Datasets | Yes | We utilize the refined MS1MV2 (Deng et al. 2019a) as our training set for fair comparisons with other SOTA methods. MS1MV2 consists of 5.8M facial images of 85K individuals. |
| Dataset Splits | No | The information is insufficient. The paper describes the training set (MS1MV2) and testing sets, but does not provide specific details on training/validation/test splits, such as percentages or sample counts for each split, nor does it cite a predefined split with such details. |
| Hardware Specification | Yes | We set the batch size to 128 for each GPU in all experiments, and train models on 8 NVIDIA Tesla V100 (32GB) GPUs. |
| Software Dependencies | Yes | The batch size is set to 8 and the Pytorch version is 1.7.1. |
| Experiment Setup | Yes | We set the batch size to 128 for each GPU in all experiments, and train models on 8 NVIDIA Tesla V100 (32GB) GPUs. We apply the SGD optimization method and divide the initial learning rate (0.1) at 10, 18, and 24 epochs. The momentum is set to 0.9, and the weight decay is 5e-4. The hyper-parameters λ1 and λ2 are set to 8.0 and 1.0, respectively. For Arc Face, we follow the common setting with s = 64 and margin m = 0.5. |