Curriculum Temperature for Knowledge Distillation

Authors: Zheng Li, Xiang Li, Lingfeng Yang, Borui Zhao, Renjie Song, Lei Luo, Jun Li, Jian Yang

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on CIFAR-100, Image Net-2012, and MS-COCO demonstrate the effectiveness of our method.
Researcher Affiliation Collaboration Zheng Li 1, Xiang Li 1*, Lingfeng Yang 2, Borui Zhao 3, Renjie Song 3, Lei Luo 2, Jun Li 2, Jian Yang 1* 1 Nankai University 2 Nanjing University of Science and Technology 3 Megvii Technology
Pseudocode Yes Algorithm 1: Curriculum Temperature Distillation
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets Yes The CIFAR-100 dataset consists of colored natural images with 32 32 pixels. The training and testing sets contain 50K and 10K images, respectively. Image Net-2012 (Deng et al. 2009) contains 1.2M images for training, and 50K for validation, from 1K classes. MSCOCO (Lin et al. 2014) is an 80-category general object detection dataset. The train2017 split contains 118k images, and the val2017 split contains 5k images.
Dataset Splits Yes The CIFAR-100 dataset consists of colored natural images with 32 32 pixels. The training and testing sets contain 50K and 10K images, respectively. Image Net-2012 (Deng et al. 2009) contains 1.2M images for training, and 50K for validation, from 1K classes. MSCOCO (Lin et al. 2014) is an 80-category general object detection dataset. The train2017 split contains 118k images, and the val2017 split contains 5k images.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper states 'All details are attached in supplement due to the page limit' but does not provide specific software dependencies with version numbers in the main text.
Experiment Setup Yes In our method, we default to set λmax, λmin and Eloops to 1, 0 and 10, respectively. ... The optimization process for Eqn. (4) and Eqn. (5) can be conducted via stochastic gradient descent (SGD).