Hierarchical Self-supervised Augmented Knowledge Distillation

Authors: Chuanguang Yang, Zhulin An, Linhang Cai, Yongjun Xu

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct evaluations on standard CIFAR-100 and Image Net [Deng et al., 2009] benchmarks across the widely applied network families including Res Net [He et al., 2016], WRN [Zagoruyko S, 2016], VGG [Simonyan and Zisserman, 2015], Mobile Net [Sandler et al., 2018] and Shuffle Net [Zhang et al., 2018; Ma et al., 2018]. Some representative KD methods including KD [Hinton et al., 2015], Fit Net [Romero et al., 2015], AT [Zagoruyko and Komodakis, 2017], AB [Heo et al., 2019], VID [Ahn et al., 2019], RKD [Park et al., 2019], SP [Tung and Mori, 2019], CC [Peng et al., 2019], CRD [Tian et al., 2020] and SOTA SSKD [Xu et al., 2020] are compared. For a fair comparison, all comparative methods are combined with conventional KD by default, and we adopt rotations {0 , 90 , 180 , 270 } as the self-supervised auxiliary task as same as SSKD. We use the standard training settings following [Xu et al., 2020] and report the mean result with a standard deviation over 3 runs.
Researcher Affiliation Academia 1Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 2University of Chinese Academy of Sciences, Beijing, China
Pseudocode No The paper describes methods in text and uses figures but does not contain a formal pseudocode or algorithm block.
Open Source Code Yes Codes are available at https://github.com/winycg/HSAKD.
Open Datasets Yes We conduct evaluations on standard CIFAR-100 and Image Net [Deng et al., 2009] benchmarks
Dataset Splits No The paper states "We use the standard training settings following [Xu et al., 2020]" but does not explicitly provide specific train/validation/test dataset splits (percentages or sample counts) or clearly cite where these splits are defined for reproduction within this paper.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers, such as specific deep learning frameworks or libraries.
Experiment Setup Yes We use the standard training settings following [Xu et al., 2020] and report the mean result with a standard deviation over 3 runs. The more detailed settings for reproducibility can be found in our released codes. Following the wide practice, we set the hyper-parameter τ = 1 in task loss and τ = 3 in mimicry loss. Besides, we do not introduce other hyper-parameters.