Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Is Label Smoothing Truly Incompatible with Knowledge Distillation: An Empirical Study
Authors: Zhiqiang Shen, Zechun Liu, Dejia Xu, Zitian Chen, Kwang-Ting Cheng, Marios Savvides
ICLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | After that, we study its one-sidedness and imperfection of the incompatibility view through massive analyses, visualizations and comprehensive experiments on Image Classification, Binary Networks, and Neural Machine Translation. |
| Researcher Affiliation | Academia | Zhiqiang Shen CMU Zechun Liu CMU & HKUST Dejia Xu Peking University Zitian Chen UMass Amherst Kwang-Ting Cheng HKUST Marios Savvides CMU |
| Pseudocode | Yes | Algorithm 1 Py Torch-like Code for Calculating Stability Metric. |
| Open Source Code | No | The paper states "Project page: http://zhiqiangshen.com/projects/LS_and_KD/index.html." but does not explicitly provide a direct link to a source-code repository or state that the code is publicly released for the work described. |
| Open Datasets | Yes | We conduct experiments on three datasets: Image Net-1K (Deng et al., 2009), CUB200-2011 (Wah et al., 2011a) and i Materialist product recognition challenge data (in Appendix D). |
| Dataset Splits | Yes | While on validation set the accuracy is comparable or even slightly better (The boosts on CUB is greater than those on Image Net-1K, as shown in Table 2). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper provides Python-like pseudocode but does not list specific software dependencies with version numbers used for the experiments (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | For training teacher networks, we follow the standard training protocol (He et al., 2016; Goyal et al., 2017), i.e., total training epoch is 90, initial learning rate is 0.1 and decayed to 1/10 with every 30 epochs. For distillation, as the supervision is a soft distribution and will dynamically change, we train with 200 epochs and the learning rate is multiplied by 0.1 at 80 and 160 epochs. |