Shadow Knowledge Distillation: Bridging Offline and Online Knowledge Transfer

Authors: Lujun Li, ZHE JIN

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on classification and object detection tasks demonstrate that our technique achieves state-of-the-art results with different CNNs and Vision Transformer models.
Researcher Affiliation Academia Lujun Li1,2, , Jin Zhe1, School of Artificial Intelligence, Anhui University, China Chinese Academy of Science, China lilujunai@gmail.com; jinzhe@ahu.edu.cn
Pseudocode No The paper describes algorithms and methods but does not include a formal 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes Code is made publicly available at https://lilujunai.github.io/SHAKE/.
Open Datasets Yes We conduct extensive experiments on multiple tasks (e.g.,classification and detection) and datasets (e.g., CIFAR-10, CIFAR-100, Tiny-Image Net, Image Net, and MS-COCO) to verify the superiority of the proposed method.
Dataset Splits Yes With CRD s settings [54], whose training epochs are 240, we perform experiments on several teacher-student models on CIFAR-100, either using the same architecture style or a different one. We employ a conventional SGD optimizer with a weight decay of 0.0005 and a mini-batch size of 64. Initialized at 0.05, the multi-step learning rate decrements by 0.1 at 150, 180, and 210 epochs.
Hardware Specification Yes Training time is measured on a single 2080Ti GPU, and represents the improving ratios than KD.
Software Dependencies No The paper mentions frameworks like 'Detectron2' but does not specify any software dependencies with version numbers.
Experiment Setup Yes We choose λ and τ to be 1 and 4 in SHAKE, respectively. We employ a conventional SGD optimizer with a weight decay of 0.0005 and a mini-batch size of 64. Initialized at 0.05, the multi-step learning rate decrements by 0.1 at 150, 180, and 210 epochs.