Online Knowledge Distillation with Diverse Peers

Authors: Defang Chen, Jian-Ping Mei, Can Wang, Yan Feng, Chun Chen3430-3437

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide experimental results in this section to evaluate the performance of the proposed approach for image classification. Experimental results show that the proposed framework consistently gives better performance than state-of-the-art approaches without sacrificing training or inference complexity, demonstrating the effectiveness of the proposed two-level distillation framework.
Researcher Affiliation Collaboration Defang Chen,1,2 Jian-Ping Mei,3* Can Wang,1,2 Yan Feng,1,2 Chun Chen1,2 1College of Computer Science, Zhejiang University, Hang Zhou, China. 2ZJU-Lianlian Pay Joint Research Center. 3College of Computer Science, Zhejiang University of Technology, Hang Zhou, China.
Pseudocode No The paper describes the methodology using text and mathematical equations, but it does not include any explicit pseudocode or algorithm blocks.
Open Source Code No Codes will be released once the paper is accepted.
Open Datasets Yes CIFAR-10 and CIFAR-100 (Krizhevsky and Hinton 2009) both contain 50,000/10,000 training/testing colored natural images with 32 × 32 pixels, which are drawn from 10/100 classes. Image Net-2012 (Russakovsky et al. 2015) is a more challenging dataset consisting of about 1.3 million training images and 50 thousand validation images from 1000 classes.
Dataset Splits Yes CIFAR-10 and CIFAR-100 (Krizhevsky and Hinton 2009) both contain 50,000/10,000 training/testing colored natural images with 32 × 32 pixels, which are drawn from 10/100 classes. Image Net-2012 (Russakovsky et al. 2015) is a more challenging dataset consisting of about 1.3 million training images and 50 thousand validation images from 1000 classes.
Hardware Specification No The paper mentions 'computing resources' in the Acknowledgments but does not provide specific hardware details such as GPU or CPU models used for experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or specific library versions).
Experiment Setup Yes We use stochastic gradient descent with Nesterov momentum for optimization and set the initial learning rate to 0.1, momentum to 0.9. For CIFAR-10/CIFAR-100 dataset, we set the mini-batch size to 128 and weight decay to 5 × 10−4. The learning rate is divided by 10 at 150 and 225 of the total 300 training epochs for these two datasets. For Image Net-2012 dataset, we set the mini-batch size to 256, the weight decay to 1 × 10−4, and the learning rate is divided by 10 at 30 and 60 of the total 90 training epochs.