Knowledge Distillation Based on Transformed Teacher Matching
Authors: Kaixiang Zheng, EN-HUI YANG
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiment results demonstrate that thanks to this inherent regularization, TTM leads to trained students with better generalization than the original KD. To further enhance student s capability to match teacher s power transformed probability distribution, we introduce a sample-adaptive weighting coefficient into TTM, yielding a novel distillation approach dubbed weighted TTM (WTTM). It is shown, by comprehensive experiments, that although WTTM is simple, it is effective, improves upon TTM, and achieves state-of-the-art accuracy performance. |
| Researcher Affiliation | Academia | Kaixiang Zheng & En-Hui Yang Department of Electrical and Computer Engineering, University of Waterloo {k56zheng,ehyang}@uwaterloo.ca |
| Pseudocode | Yes | In this section, we provide the pseudo-code for TTM and WTTM in a Pytorch-like style, shown in Algorithm 1. It s clear that both TTM and WTTM are quite easy to implement. |
| Open Source Code | Yes | Our source code is available at https://github.com/zkxufo/TTM. |
| Open Datasets | Yes | We benchmark TTM and WTTM on two prevailing image classification datasets, namely CIFAR100 and Image Net (Deng et al., 2009). |
| Dataset Splits | Yes | CIFAR-100 contains 60k 32 32 color images of 100 classes, with 600 images per class, and it s further split into 50k training images and 10k test images. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) are mentioned for the experimental setup. |
| Software Dependencies | No | The paper mentions 'torchdistill (Matsubara, 2021) library' and 'Py Torch (Paszke et al., 2019)' but does not specify their version numbers or other software dependencies with versions. |
| Experiment Setup | Yes | Note that we list T and β values of all experiments in A.4 for reproducibility. |