Hierarchical Self-supervised Augmented Knowledge Distillation
Authors: Chuanguang Yang, Zhulin An, Linhang Cai, Yongjun Xu
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct evaluations on standard CIFAR-100 and Image Net [Deng et al., 2009] benchmarks across the widely applied network families including Res Net [He et al., 2016], WRN [Zagoruyko S, 2016], VGG [Simonyan and Zisserman, 2015], Mobile Net [Sandler et al., 2018] and Shuffle Net [Zhang et al., 2018; Ma et al., 2018]. Some representative KD methods including KD [Hinton et al., 2015], Fit Net [Romero et al., 2015], AT [Zagoruyko and Komodakis, 2017], AB [Heo et al., 2019], VID [Ahn et al., 2019], RKD [Park et al., 2019], SP [Tung and Mori, 2019], CC [Peng et al., 2019], CRD [Tian et al., 2020] and SOTA SSKD [Xu et al., 2020] are compared. For a fair comparison, all comparative methods are combined with conventional KD by default, and we adopt rotations {0 , 90 , 180 , 270 } as the self-supervised auxiliary task as same as SSKD. We use the standard training settings following [Xu et al., 2020] and report the mean result with a standard deviation over 3 runs. |
| Researcher Affiliation | Academia | 1Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 2University of Chinese Academy of Sciences, Beijing, China |
| Pseudocode | No | The paper describes methods in text and uses figures but does not contain a formal pseudocode or algorithm block. |
| Open Source Code | Yes | Codes are available at https://github.com/winycg/HSAKD. |
| Open Datasets | Yes | We conduct evaluations on standard CIFAR-100 and Image Net [Deng et al., 2009] benchmarks |
| Dataset Splits | No | The paper states "We use the standard training settings following [Xu et al., 2020]" but does not explicitly provide specific train/validation/test dataset splits (percentages or sample counts) or clearly cite where these splits are defined for reproduction within this paper. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers, such as specific deep learning frameworks or libraries. |
| Experiment Setup | Yes | We use the standard training settings following [Xu et al., 2020] and report the mean result with a standard deviation over 3 runs. The more detailed settings for reproducibility can be found in our released codes. Following the wide practice, we set the hyper-parameter τ = 1 in task loss and τ = 3 in mimicry loss. Besides, we do not introduce other hyper-parameters. |