Respecting Transfer Gap in Knowledge Distillation

Authors: Yulei Niu, Long Chen, Chang Zhou, Hanwang Zhang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on CIFAR-100 and Image Net demonstrate the effectiveness of IPWD for both two-stage distillation and one-stage self-distillation. We take the image classification task as a case study to evaluate the effectiveness and generalizability of our IPWD. Following previous works [57, 73, 28], we conduct experiments with two settings, two-stage distillation and one-stage self-distillation.
Researcher Affiliation Collaboration Yulei Niu 1 Columbia University 2Damo Academy, Alibaba Group 3Nanyang Technological University {yn.yuleiniu,zjuchenlong}@gmail.com zhouchang.zc@alibaba-inc.com hanwangzhang@ntu.edu.sg
Pseudocode No The paper describes its method using mathematical equations and textual explanations, but no explicit pseudocode or algorithm blocks are provided.
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] Please refer to the supplemental material.
Open Datasets Yes We conducted experiments on CIFAR-100 [29] and Image Net [10]. CIFAR-100 contains 50K images in the training set and 10K images in the test set from 100 classes. Image Net provides 1.2M images in the training set and 50K images in the validation set from 1K classes.
Dataset Splits Yes CIFAR-100 contains 50K images in the training set and 10K images in the test set from 100 classes. Image Net provides 1.2M images in the training set and 50K images in the validation set from 1K classes.
Hardware Specification No The paper states 'See Appendix' for hardware specifications ('Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See Appendix.'), but the appendix content is not provided in the given text.
Software Dependencies No The paper mentions implementing based on 'open-sourced code' from other works (e.g., 'For experiments on CIFAR-100, we followed CRD [57] based on the open-sourced code'), but it does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries).
Experiment Setup Yes For experiments on CIFAR-100, we followed CRD [57] based on the open-sourced code. We set the trade-off hyper-parameter α = 5 in Eq. (7) and the temperature τ = 10. Other training details were the same as CRD [57] and provided in the appendix. For Image Net, we followed Zhou et al. [73] to conduct experiments based on their open-sourced code. We used the same hyper parameters as WSLD [73], i.e., α as 2.5 and τ as 2.