Shadow Knowledge Distillation: Bridging Offline and Online Knowledge Transfer
Authors: Lujun Li, ZHE JIN
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on classification and object detection tasks demonstrate that our technique achieves state-of-the-art results with different CNNs and Vision Transformer models. |
| Researcher Affiliation | Academia | Lujun Li1,2, , Jin Zhe1, School of Artificial Intelligence, Anhui University, China Chinese Academy of Science, China lilujunai@gmail.com; jinzhe@ahu.edu.cn |
| Pseudocode | No | The paper describes algorithms and methods but does not include a formal 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | Code is made publicly available at https://lilujunai.github.io/SHAKE/. |
| Open Datasets | Yes | We conduct extensive experiments on multiple tasks (e.g.,classification and detection) and datasets (e.g., CIFAR-10, CIFAR-100, Tiny-Image Net, Image Net, and MS-COCO) to verify the superiority of the proposed method. |
| Dataset Splits | Yes | With CRD s settings [54], whose training epochs are 240, we perform experiments on several teacher-student models on CIFAR-100, either using the same architecture style or a different one. We employ a conventional SGD optimizer with a weight decay of 0.0005 and a mini-batch size of 64. Initialized at 0.05, the multi-step learning rate decrements by 0.1 at 150, 180, and 210 epochs. |
| Hardware Specification | Yes | Training time is measured on a single 2080Ti GPU, and represents the improving ratios than KD. |
| Software Dependencies | No | The paper mentions frameworks like 'Detectron2' but does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | We choose λ and τ to be 1 and 4 in SHAKE, respectively. We employ a conventional SGD optimizer with a weight decay of 0.0005 and a mini-batch size of 64. Initialized at 0.05, the multi-step learning rate decrements by 0.1 at 150, 180, and 210 epochs. |