reproducibilityindex.ai

Knowledge Distillation Based on Transformed Teacher Matching

Authors: Kaixiang Zheng, EN-HUI YANG

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiment results demonstrate that thanks to this inherent regularization, TTM leads to trained students with better generalization than the original KD. To further enhance student s capability to match teacher s power transformed probability distribution, we introduce a sample-adaptive weighting coefficient into TTM, yielding a novel distillation approach dubbed weighted TTM (WTTM). It is shown, by comprehensive experiments, that although WTTM is simple, it is effective, improves upon TTM, and achieves state-of-the-art accuracy performance.
Researcher Affiliation	Academia	Kaixiang Zheng & En-Hui Yang Department of Electrical and Computer Engineering, University of Waterloo {k56zheng,ehyang}@uwaterloo.ca
Pseudocode	Yes	In this section, we provide the pseudo-code for TTM and WTTM in a Pytorch-like style, shown in Algorithm 1. It s clear that both TTM and WTTM are quite easy to implement.
Open Source Code	Yes	Our source code is available at https://github.com/zkxufo/TTM.
Open Datasets	Yes	We benchmark TTM and WTTM on two prevailing image classification datasets, namely CIFAR100 and Image Net (Deng et al., 2009).
Dataset Splits	Yes	CIFAR-100 contains 60k 32 32 color images of 100 classes, with 600 images per class, and it s further split into 50k training images and 10k test images.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) are mentioned for the experimental setup.
Software Dependencies	No	The paper mentions 'torchdistill (Matsubara, 2021) library' and 'Py Torch (Paszke et al., 2019)' but does not specify their version numbers or other software dependencies with versions.
Experiment Setup	Yes	Note that we list T and β values of all experiments in A.4 for reproducibility.