Teaching What You Should Teach: A Data-Based Distillation Method
Authors: Shitong Shao, Huanran Chen, Zhen Huang, Linrui Gong, Shuai Wang, Xinxiao Wu
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To verify the effectiveness of our method, we conducted extensive comparative experiments on object recognition, detection, and segmentation tasks. The results on the CIFAR-100, Image Net-1k, MS-COCO, and Cityscapes datasets demonstrate that our method achieves state-of-the-art performance on almost all teacher-student pairs. Furthermore, we conduct visualization studies to explore what magnitudes and probabilities are needed for the distillation process. |
| Researcher Affiliation | Academia | 1Beijing Key Laboratory of Intelligent Information Technology, School of Computer Science and Technology, Beijing Institute of Technology, China 2Univeristy of Science and Technology of China, Hefei, China 3Hunan University, Hunan, China 4Tsinghua University, Beijing, China |
| Pseudocode | Yes | The more detailed algorithmic procedure of TST can be found in Appendix C. |
| Open Source Code | No | The paper does not provide a link to open-source code or explicitly state that the code for the described methodology is released. |
| Open Datasets | Yes | We conduct extensive experiments on image classification (CIFAR-100 [Krizhevsky and Hinton, 2009] and Image Net-1k [Russakovsky et al., 2015]), object detection (MS-COCO [Lin et al., 2014]), and semantic segmentation (Cityscapes [Cordts et al., 2016]) tasks. |
| Dataset Splits | Yes | Table 2: Results on the Image Net validation set. Table 3: Results on the COCO validation set. Table 4: Results on the Cityscapes validation set. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions "Torchvision [Marcel and Rodriguez, 2010]" and "mmrazor [Contributors, 2021]" but does not specify version numbers for these or other key software components, such as Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We apply batch size 128 and initial learning rate 0.1 on CIFAR-100. And we follow the settings in [Huang et al., 2022] for the Res Net34-Res Net18 pair and the Res Net50-Mobile Net pair on Image Net-1k. The settings of other classification, detection and segmentation tasks can be found in Appendix B. |