Teaching What You Should Teach: A Data-Based Distillation Method

Authors: Shitong Shao, Huanran Chen, Zhen Huang, Linrui Gong, Shuai Wang, Xinxiao Wu

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To verify the effectiveness of our method, we conducted extensive comparative experiments on object recognition, detection, and segmentation tasks. The results on the CIFAR-100, Image Net-1k, MS-COCO, and Cityscapes datasets demonstrate that our method achieves state-of-the-art performance on almost all teacher-student pairs. Furthermore, we conduct visualization studies to explore what magnitudes and probabilities are needed for the distillation process.
Researcher Affiliation Academia 1Beijing Key Laboratory of Intelligent Information Technology, School of Computer Science and Technology, Beijing Institute of Technology, China 2Univeristy of Science and Technology of China, Hefei, China 3Hunan University, Hunan, China 4Tsinghua University, Beijing, China
Pseudocode Yes The more detailed algorithmic procedure of TST can be found in Appendix C.
Open Source Code No The paper does not provide a link to open-source code or explicitly state that the code for the described methodology is released.
Open Datasets Yes We conduct extensive experiments on image classification (CIFAR-100 [Krizhevsky and Hinton, 2009] and Image Net-1k [Russakovsky et al., 2015]), object detection (MS-COCO [Lin et al., 2014]), and semantic segmentation (Cityscapes [Cordts et al., 2016]) tasks.
Dataset Splits Yes Table 2: Results on the Image Net validation set. Table 3: Results on the COCO validation set. Table 4: Results on the Cityscapes validation set.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions "Torchvision [Marcel and Rodriguez, 2010]" and "mmrazor [Contributors, 2021]" but does not specify version numbers for these or other key software components, such as Python, PyTorch, or CUDA.
Experiment Setup Yes We apply batch size 128 and initial learning rate 0.1 on CIFAR-100. And we follow the settings in [Huang et al., 2022] for the Res Net34-Res Net18 pair and the Res Net50-Mobile Net pair on Image Net-1k. The settings of other classification, detection and segmentation tasks can be found in Appendix B.