Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Shadow Knowledge Distillation: Bridging Offline and Online Knowledge Transfer
Authors: Lujun Li, ZHE JIN
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on classification and object detection tasks demonstrate that our technique achieves state-of-the-art results with different CNNs and Vision Transformer models. |
| Researcher Affiliation | Academia | Lujun Li1,2, , Jin Zhe1, School of Artificial Intelligence, Anhui University, China Chinese Academy of Science, China EMAIL; EMAIL |
| Pseudocode | No | The paper describes algorithms and methods but does not include a formal 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | Code is made publicly available at https://lilujunai.github.io/SHAKE/. |
| Open Datasets | Yes | We conduct extensive experiments on multiple tasks (e.g.,classification and detection) and datasets (e.g., CIFAR-10, CIFAR-100, Tiny-Image Net, Image Net, and MS-COCO) to verify the superiority of the proposed method. |
| Dataset Splits | Yes | With CRD s settings [54], whose training epochs are 240, we perform experiments on several teacher-student models on CIFAR-100, either using the same architecture style or a different one. We employ a conventional SGD optimizer with a weight decay of 0.0005 and a mini-batch size of 64. Initialized at 0.05, the multi-step learning rate decrements by 0.1 at 150, 180, and 210 epochs. |
| Hardware Specification | Yes | Training time is measured on a single 2080Ti GPU, and represents the improving ratios than KD. |
| Software Dependencies | No | The paper mentions frameworks like 'Detectron2' but does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | We choose λ and τ to be 1 and 4 in SHAKE, respectively. We employ a conventional SGD optimizer with a weight decay of 0.0005 and a mini-batch size of 64. Initialized at 0.05, the multi-step learning rate decrements by 0.1 at 150, 180, and 210 epochs. |