Adaptive Hierarchy-Branch Fusion for Online Knowledge Distillation
Authors: Linrui Gong, Shaohui Lin, Baochang Zhang, Yunhang Shen, Ke Li, Ruizhi Qiao, Bo Ren, Muqing Li, Zhou Yu, Lizhuang Ma
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the effectiveness of AHBF-OKD on different datasets, including CIFAR-10/100 and Image Net 2012. |
| Researcher Affiliation | Collaboration | 1East China Normal University, Shanghai, China 2Beihang University, China 3Tencent Youtu Lab, China 4Key Laboratory of Advanced Theory and Application in Statistics and Data Science MOE, China |
| Pseudocode | No | The paper describes the proposed method textually and visually with diagrams, but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code is available at https://github.com/linruigong965/AHBF. |
| Open Datasets | Yes | We evaluate the proposed AHBF-OKD approach on three widely-used datasets, CIFAR-10/100(Krizhevsky, Hinton et al. 2009) and Image Net 2012(Russakovsky et al. 2015). |
| Dataset Splits | No | The paper provides details on training parameters and procedures, but does not explicitly specify the training/validation/test dataset splits (e.g., percentages or sample counts) needed for reproduction. |
| Hardware Specification | Yes | The proposed AHBF-OKD is implemented by Py Torch 1.10 and Mind Spore 1.7.0 (Huawei 2020), and trained on two NVIDIA 3090 GPUs. |
| Software Dependencies | Yes | The proposed AHBF-OKD is implemented by Py Torch 1.10 and Mind Spore 1.7.0 (Huawei 2020)... |
| Experiment Setup | Yes | The branch number M and auxiliary block number a are both set to 4, unless otherwise specified. We use SGD with Nesterov momentum 0.9 as the optimizer and the temperature τ is set to 3. For CIFAR-10/100 datasets, we set the batch size to 128 and the initial learning rate to 0.1. The learning rate is decayed by 0.1 at the epochs 150 and 225 with 300 epochs in total and the weight decay is set to 5e-4. For Image Net 2012, we set batch size to 96, and the learning rate is also initialized by 0.1, which is decayed by 0.1 at epochs 30 and 60 with a total of 90 epochs. The weight decay is set to 1e-4. In default, the hyper-parameter E is respectively set to 300 and 90 on CIFAR-10/100 and Image Net 2012, and (λ1, λ1) is set to (4, 2). All results are generated by averaging the results over 3 runs. |