Does label smoothing mitigate label noise?

Authors: Michal Lukasik, Srinadh Bhojanapalli, Aditya Menon, Sanjiv Kumar

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We now present experimental observations of the effects of label smoothing under label noise. We then provide insights into why smoothing can successfully denoise labels, by viewing smoothing as a form of shrinkage regularisation. We begin by empirically answering the question: can label smoothing successfully mitigate label noise? To study this, we employ smoothing in settings where the training data is artificially injected with symmetric label noise. This follows the convention in the label noise literature (Patrini et al., 2017; Han et al., 2018a; Charoenphakdee et al., 2019). Specifically, we consider the CIFAR-10, CIFAR-100 and Image Net datasets, and add symmetric label noise at level ρ = 20% to the training (but not the test) set; i.e., we replace the training label with a uniformly chosen label 20% of the time. On CIFAR-10 and CIFAR-100 we train two different models on this noisy data, Res Net-32 and Res Net-56, with similar hyperparameters as Müller et al. (2019). Each experiment is repeated five times, and we report the mean and standard deviation of the clean test accuracy. On Image Net we train Res Net-v2-50 with LARS (You et al., 2017). We describe the detailed experimental setup in Appendix B.
Researcher Affiliation Industry Michal Lukasik 1 Srinadh Bhojanapalli 1 Aditya Krishna Menon 1 Sanjiv Kumar 1 1Google Research. Correspondence to: Michal Lukasik <mlukasik@google.com>.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes Specifically, we consider the CIFAR-10, CIFAR-100 and Image Net datasets, and add symmetric label noise at level ρ = 20% to the training (but not the test) set; i.e., we replace the training label with a uniformly chosen label 20% of the time.
Dataset Splits No The paper mentions training and testing sets, but does not explicitly describe a validation split or how it was used, nor does it provide specific percentages or sample counts for such a split in the provided text.
Hardware Specification No The paper mentions models like Res Net-32, Res Net-56, and Res Net-v2-50, but it does not specify the hardware (e.g., specific GPUs, CPUs, or cloud instances) used for the experiments. It refers to Appendix B for detailed setup but this is not provided in the text.
Software Dependencies No The paper mentions using LARS (You et al., 2017) and refers to hyperparameters of Müller et al. (2019), but it does not provide specific software library names with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions) that would be needed for replication. It refers to Appendix B for detailed setup but this is not provided in the text.
Experiment Setup Yes On CIFAR-10 and CIFAR-100 we train two different models on this noisy data, Res Net-32 and Res Net-56, with similar hyperparameters as Müller et al. (2019). Each experiment is repeated five times, and we report the mean and standard deviation of the clean test accuracy. On Image Net we train Res Net-v2-50 with LARS (You et al., 2017). We describe the detailed experimental setup in Appendix B. As loss functions, our baseline is training with the softmax cross-entropy on the noisy labels. We then employ label smoothing (1) (LS) for various values of α, as well as forward (FC) and backward (BC) correction (4), (5) assuming symmetric noise for various values of α. (...) We use label smoothing parameter α = 0.1 and temperature parameter T = 2 during distillation, for all these experiments.