Cross-Layer Distillation with Semantic Calibration
Authors: Defang Chen, Jian-Ping Mei, Yuan Zhang, Can Wang, Zhe Wang, Yan Feng, Chun Chen7028-7036
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Consistent improvements over state-of-the-art approaches are observed in extensive experiments with various network architectures for teacher and student models, demonstrating the effectiveness and flexibility of the proposed attention based soft layer association mechanism for cross-layer distillation. |
| Researcher Affiliation | Collaboration | 1College of Computer Science, Zhejiang University, China. 2Zhejiang Provincial Key Laboratory of Service Robot. 3Zhejiang University-Lianlian Pay Joint Research Center. 4College of Computer Science, Zhejiang University of Technology, China. |
| Pseudocode | Yes | Algorithm 1 Semantic Calibration for Distillation. |
| Open Source Code | Yes | The code is available at https://github.com/DefangChen/SemCKD. |
| Open Datasets | Yes | We conduct a series of classification tasks on the CIFAR100 (Krizhevsky and Hinton 2009) and Image Net datasets (Russakovsky et al. 2015). |
| Dataset Splits | No | The paper mentions using CIFAR-100 and Image Net datasets and refers to a 'training dataset D' and 'mini-batch with size b', but does not explicitly state the percentages or counts for training, validation, or test splits. It only mentions 'Top-1 test accuracy'. |
| Hardware Specification | No | The detailed descriptions of computing infrastructure, network architectures, data processing, hyper-parameters in model optimization for reproducibility as well as more results are included in the technical appendix. |
| Software Dependencies | No | The detailed descriptions of computing infrastructure, network architectures, data processing, hyper-parameters in model optimization for reproducibility as well as more results are included in the technical appendix. |
| Experiment Setup | Yes | The range of hyper-parameter β for Sem CKD is set as 100 to 1100 at equal interval of 100, while the hyper-parameter β for CRD ranges from 0.5 to 1.5 at equal interval of 0.1, adopting the same search space as the original paper (Tian, Krishnan, and Isola 2020). |