Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in Knowledge Distillation
Authors: Taehyeon Kim, Jaehoon Oh, Nak Yil Kim, Sangwook Cho, Se-Young Yun
IJCAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We investigate the training and test accuracies according to the change in α in L and τ in LKL (Figure 3). |
| Researcher Affiliation | Academia | Taehyeon Kim 1 , Jaehoon Oh 2 , Nak Yil Kim1 , Sangwook Cho1 and Se-Young Yun1 1Graduate School of Artificial Intelligence, KAIST 2Graduate School of Knowledge Service Engineering, KAIST EMAIL |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code to reproduce the experiments is publicly available online at https://github.com/jhoon-oh/kd data/. |
| Open Datasets | Yes | image classification on CIFAR-100 with a family of Wide-Res Net (WRN) [Zagoruyko and Komodakis, 2016b] and Image Net with a family of of Res Net (RN) [He et al., 2016]. |
| Dataset Splits | No | The paper mentions training and testing datasets (CIFAR-100, ImageNet) but does not provide specific training/validation/test dataset splits or explicit mention of a validation set in the experimental setup. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Py Torch SGD optimizer' but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | We used a standard Py Torch SGD optimizer with a momentum of 0.9, weight decay, and apply standard data augmentation. Other than those mentioned, the training settings from the original papers [Heo et al., 2019a; Cho and Hariharan, 2019] were used. |