When does label smoothing help?
Authors: Rafael Müller, Simon Kornblith, Geoffrey E. Hinton
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here we show empirically that in addition to improving generalization, label smoothing improves model calibration which can significantly improve beam-search. |
| Researcher Affiliation | Industry | Rafael Müller , Simon Kornblith, Geoffrey Hinton Google Brain Toronto rafaelmuller@google.com |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about open-source code availability or links to code repositories for the described methodology. |
| Open Datasets | Yes | In Fig. 1, we show results of visualizing penultimate layer representations of image classifiers trained on the datasets CIFAR-10, CIFAR-100 and Image Net with the architectures Alex Net [12], Res Net-56 [13] and Inception-v4 [14], respectively. |
| Dataset Splits | Yes | The first two columns represent examples from the training and validation set for a network trained without label smoothing (w/o LS). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers. |
| Experiment Setup | Yes | For a network trained with a label smoothing of parameter α, we minimize instead the cross-entropy between the modified targets y LS k and the networks outputs pk, where y LS k = yk(1 α) + α/K. ... By training the same model with α = 0.05 (green line), we obtain a model that is similarly calibrated compared to temperature scaling. ... we consider only the case where β = 1, i.e., when the targets are the teacher output and the true labels are ignored. |