Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
When does label smoothing help?
Authors: Rafael Müller, Simon Kornblith, Geoffrey E. Hinton
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here we show empirically that in addition to improving generalization, label smoothing improves model calibration which can significantly improve beam-search. |
| Researcher Affiliation | Industry | Rafael Müller , Simon Kornblith, Geoffrey Hinton Google Brain Toronto EMAIL |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about open-source code availability or links to code repositories for the described methodology. |
| Open Datasets | Yes | In Fig. 1, we show results of visualizing penultimate layer representations of image classifiers trained on the datasets CIFAR-10, CIFAR-100 and Image Net with the architectures Alex Net [12], Res Net-56 [13] and Inception-v4 [14], respectively. |
| Dataset Splits | Yes | The first two columns represent examples from the training and validation set for a network trained without label smoothing (w/o LS). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers. |
| Experiment Setup | Yes | For a network trained with a label smoothing of parameter α, we minimize instead the cross-entropy between the modified targets y LS k and the networks outputs pk, where y LS k = yk(1 α) + α/K. ... By training the same model with α = 0.05 (green line), we obtain a model that is similarly calibrated compared to temperature scaling. ... we consider only the case where β = 1, i.e., when the targets are the teacher output and the true labels are ignored. |