Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Revisiting Label Smoothing and Knowledge Distillation Compatibility: What was Missing?

Authors: Keshigeyan Chandrasegaran, Ngoc-Trung Tran, Yunqing Zhao, Ngai-Man Cheung

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our discovery is comprehensively supported by large-scale experiments, analyses and case studies including image classification, neural machine translation and compact student distillation tasks spanning across multiple datasets and teacherstudent architectures.
Researcher Affiliation	Academia	1Singapore University of Technology and Design (SUTD). Correspondence to: Ngai-Man Cheung <ngaiman EMAIL>.
Pseudocode	Yes	We include the visualization algorithm and Numpy-style code in Supplementary F.
Open Source Code	Yes	Code and models are available at https://keshik6.github.io/ revisiting-ls-kd-compatibility/
Open Datasets	Yes	large-scale KD experiments including image classification using Image Net-1K (Deng et al., 2009), fine-grained image classification using CUB200-2011 (Wah et al., 2011), neural machine translation (English German, English Russian translation) using IWSLT
Dataset Splits	Yes	For visualization of penultimate layer representations, we use 150 samples for training set and 50 samples for validation set.
Hardware Specification	No	The paper does not specify particular hardware components such as specific GPU or CPU models used for running the experiments.
Software Dependencies	Yes	To allow for training in containerised environments (HPC, Super-computing clusters), please use nvcr.io/nvidia/pytorch:20.12-py3 container.
Experiment Setup	Yes	For training LS networks, we train for 90 epochs with initial learning rate 0.1 decayed by a factor of 10 every 30 epochs. For KD experiments, we train for 200 epochs with initial learning rate 0.1 decayed by a factor of 10 every 80 epochs. We conducted a grid search for hyper-parameters as well. For all experiments, we use a batch size of 256 and SGD with momentum 0.9 .