Online Knowledge Distillation with Diverse Peers
Authors: Defang Chen, Jian-Ping Mei, Can Wang, Yan Feng, Chun Chen3430-3437
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide experimental results in this section to evaluate the performance of the proposed approach for image classification. Experimental results show that the proposed framework consistently gives better performance than state-of-the-art approaches without sacrificing training or inference complexity, demonstrating the effectiveness of the proposed two-level distillation framework. |
| Researcher Affiliation | Collaboration | Defang Chen,1,2 Jian-Ping Mei,3* Can Wang,1,2 Yan Feng,1,2 Chun Chen1,2 1College of Computer Science, Zhejiang University, Hang Zhou, China. 2ZJU-Lianlian Pay Joint Research Center. 3College of Computer Science, Zhejiang University of Technology, Hang Zhou, China. |
| Pseudocode | No | The paper describes the methodology using text and mathematical equations, but it does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | Codes will be released once the paper is accepted. |
| Open Datasets | Yes | CIFAR-10 and CIFAR-100 (Krizhevsky and Hinton 2009) both contain 50,000/10,000 training/testing colored natural images with 32 × 32 pixels, which are drawn from 10/100 classes. Image Net-2012 (Russakovsky et al. 2015) is a more challenging dataset consisting of about 1.3 million training images and 50 thousand validation images from 1000 classes. |
| Dataset Splits | Yes | CIFAR-10 and CIFAR-100 (Krizhevsky and Hinton 2009) both contain 50,000/10,000 training/testing colored natural images with 32 × 32 pixels, which are drawn from 10/100 classes. Image Net-2012 (Russakovsky et al. 2015) is a more challenging dataset consisting of about 1.3 million training images and 50 thousand validation images from 1000 classes. |
| Hardware Specification | No | The paper mentions 'computing resources' in the Acknowledgments but does not provide specific hardware details such as GPU or CPU models used for experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or specific library versions). |
| Experiment Setup | Yes | We use stochastic gradient descent with Nesterov momentum for optimization and set the initial learning rate to 0.1, momentum to 0.9. For CIFAR-10/CIFAR-100 dataset, we set the mini-batch size to 128 and weight decay to 5 × 10−4. The learning rate is divided by 10 at 150 and 225 of the total 300 training epochs for these two datasets. For Image Net-2012 dataset, we set the mini-batch size to 256, the weight decay to 1 × 10−4, and the learning rate is divided by 10 at 30 and 60 of the total 90 training epochs. |