reproducibilityindex.ai

Cross-Layer Distillation with Semantic Calibration

Authors: Defang Chen, Jian-Ping Mei, Yuan Zhang, Can Wang, Zhe Wang, Yan Feng, Chun Chen7028-7036

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Consistent improvements over state-of-the-art approaches are observed in extensive experiments with various network architectures for teacher and student models, demonstrating the effectiveness and ﬂexibility of the proposed attention based soft layer association mechanism for cross-layer distillation.
Researcher Affiliation	Collaboration	1College of Computer Science, Zhejiang University, China. 2Zhejiang Provincial Key Laboratory of Service Robot. 3Zhejiang University-Lianlian Pay Joint Research Center. 4College of Computer Science, Zhejiang University of Technology, China.
Pseudocode	Yes	Algorithm 1 Semantic Calibration for Distillation.
Open Source Code	Yes	The code is available at https://github.com/DefangChen/SemCKD.
Open Datasets	Yes	We conduct a series of classiﬁcation tasks on the CIFAR100 (Krizhevsky and Hinton 2009) and Image Net datasets (Russakovsky et al. 2015).
Dataset Splits	No	The paper mentions using CIFAR-100 and Image Net datasets and refers to a 'training dataset D' and 'mini-batch with size b', but does not explicitly state the percentages or counts for training, validation, or test splits. It only mentions 'Top-1 test accuracy'.
Hardware Specification	No	The detailed descriptions of computing infrastructure, network architectures, data processing, hyper-parameters in model optimization for reproducibility as well as more results are included in the technical appendix.
Software Dependencies	No	The detailed descriptions of computing infrastructure, network architectures, data processing, hyper-parameters in model optimization for reproducibility as well as more results are included in the technical appendix.
Experiment Setup	Yes	The range of hyper-parameter β for Sem CKD is set as 100 to 1100 at equal interval of 100, while the hyper-parameter β for CRD ranges from 0.5 to 1.5 at equal interval of 0.1, adopting the same search space as the original paper (Tian, Krishnan, and Isola 2020).