Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in Knowledge Distillation

Authors: Taehyeon Kim, Jaehoon Oh, Nak Yil Kim, Sangwook Cho, Se-Young Yun

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We investigate the training and test accuracies according to the change in α in L and τ in LKL (Figure 3).
Researcher Affiliation Academia Taehyeon Kim 1 , Jaehoon Oh 2 , Nak Yil Kim1 , Sangwook Cho1 and Se-Young Yun1 1Graduate School of Artificial Intelligence, KAIST 2Graduate School of Knowledge Service Engineering, KAIST {potter32, jhoon.oh, nakyilkim, sangwookcho, yunseyoung}@kaist.ac.kr
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The code to reproduce the experiments is publicly available online at https://github.com/jhoon-oh/kd data/.
Open Datasets Yes image classification on CIFAR-100 with a family of Wide-Res Net (WRN) [Zagoruyko and Komodakis, 2016b] and Image Net with a family of of Res Net (RN) [He et al., 2016].
Dataset Splits No The paper mentions training and testing datasets (CIFAR-100, ImageNet) but does not provide specific training/validation/test dataset splits or explicit mention of a validation set in the experimental setup.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions 'Py Torch SGD optimizer' but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes We used a standard Py Torch SGD optimizer with a momentum of 0.9, weight decay, and apply standard data augmentation. Other than those mentioned, the training settings from the original papers [Heo et al., 2019a; Cho and Hariharan, 2019] were used.