Curriculum Temperature for Knowledge Distillation
Authors: Zheng Li, Xiang Li, Lingfeng Yang, Borui Zhao, Renjie Song, Lei Luo, Jun Li, Jian Yang
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on CIFAR-100, Image Net-2012, and MS-COCO demonstrate the effectiveness of our method. |
| Researcher Affiliation | Collaboration | Zheng Li 1, Xiang Li 1*, Lingfeng Yang 2, Borui Zhao 3, Renjie Song 3, Lei Luo 2, Jun Li 2, Jian Yang 1* 1 Nankai University 2 Nanjing University of Science and Technology 3 Megvii Technology |
| Pseudocode | Yes | Algorithm 1: Curriculum Temperature Distillation |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | Yes | The CIFAR-100 dataset consists of colored natural images with 32 32 pixels. The training and testing sets contain 50K and 10K images, respectively. Image Net-2012 (Deng et al. 2009) contains 1.2M images for training, and 50K for validation, from 1K classes. MSCOCO (Lin et al. 2014) is an 80-category general object detection dataset. The train2017 split contains 118k images, and the val2017 split contains 5k images. |
| Dataset Splits | Yes | The CIFAR-100 dataset consists of colored natural images with 32 32 pixels. The training and testing sets contain 50K and 10K images, respectively. Image Net-2012 (Deng et al. 2009) contains 1.2M images for training, and 50K for validation, from 1K classes. MSCOCO (Lin et al. 2014) is an 80-category general object detection dataset. The train2017 split contains 118k images, and the val2017 split contains 5k images. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper states 'All details are attached in supplement due to the page limit' but does not provide specific software dependencies with version numbers in the main text. |
| Experiment Setup | Yes | In our method, we default to set λmax, λmin and Eloops to 1, 0 and 10, respectively. ... The optimization process for Eqn. (4) and Eqn. (5) can be conducted via stochastic gradient descent (SGD). |