Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in Knowledge Distillation
Authors: Taehyeon Kim, Jaehoon Oh, Nak Yil Kim, Sangwook Cho, Se-Young Yun
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We investigate the training and test accuracies according to the change in α in L and τ in LKL (Figure 3). |
| Researcher Affiliation | Academia | Taehyeon Kim 1 , Jaehoon Oh 2 , Nak Yil Kim1 , Sangwook Cho1 and Se-Young Yun1 1Graduate School of Artificial Intelligence, KAIST 2Graduate School of Knowledge Service Engineering, KAIST {potter32, jhoon.oh, nakyilkim, sangwookcho, yunseyoung}@kaist.ac.kr |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code to reproduce the experiments is publicly available online at https://github.com/jhoon-oh/kd data/. |
| Open Datasets | Yes | image classification on CIFAR-100 with a family of Wide-Res Net (WRN) [Zagoruyko and Komodakis, 2016b] and Image Net with a family of of Res Net (RN) [He et al., 2016]. |
| Dataset Splits | No | The paper mentions training and testing datasets (CIFAR-100, ImageNet) but does not provide specific training/validation/test dataset splits or explicit mention of a validation set in the experimental setup. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Py Torch SGD optimizer' but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | We used a standard Py Torch SGD optimizer with a momentum of 0.9, weight decay, and apply standard data augmentation. Other than those mentioned, the training settings from the original papers [Heo et al., 2019a; Cho and Hariharan, 2019] were used. |