Cross-Layer Distillation with Semantic Calibration

Authors: Defang Chen, Jian-Ping Mei, Yuan Zhang, Can Wang, Zhe Wang, Yan Feng, Chun Chen7028-7036

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Consistent improvements over state-of-the-art approaches are observed in extensive experiments with various network architectures for teacher and student models, demonstrating the effectiveness and flexibility of the proposed attention based soft layer association mechanism for cross-layer distillation.
Researcher Affiliation Collaboration 1College of Computer Science, Zhejiang University, China. 2Zhejiang Provincial Key Laboratory of Service Robot. 3Zhejiang University-Lianlian Pay Joint Research Center. 4College of Computer Science, Zhejiang University of Technology, China.
Pseudocode Yes Algorithm 1 Semantic Calibration for Distillation.
Open Source Code Yes The code is available at https://github.com/DefangChen/SemCKD.
Open Datasets Yes We conduct a series of classification tasks on the CIFAR100 (Krizhevsky and Hinton 2009) and Image Net datasets (Russakovsky et al. 2015).
Dataset Splits No The paper mentions using CIFAR-100 and Image Net datasets and refers to a 'training dataset D' and 'mini-batch with size b', but does not explicitly state the percentages or counts for training, validation, or test splits. It only mentions 'Top-1 test accuracy'.
Hardware Specification No The detailed descriptions of computing infrastructure, network architectures, data processing, hyper-parameters in model optimization for reproducibility as well as more results are included in the technical appendix.
Software Dependencies No The detailed descriptions of computing infrastructure, network architectures, data processing, hyper-parameters in model optimization for reproducibility as well as more results are included in the technical appendix.
Experiment Setup Yes The range of hyper-parameter β for Sem CKD is set as 100 to 1100 at equal interval of 100, while the hyper-parameter β for CRD ranges from 0.5 to 1.5 at equal interval of 0.1, adopting the same search space as the original paper (Tian, Krishnan, and Isola 2020).