To Smooth or Not? When Label Smoothing Meets Noisy Labels

Authors: Jiaheng Wei, Hangyu Liu, Tongliang Liu, Gang Niu, Masashi Sugiyama, Yang Liu

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide extensive experimental results on multiple benchmarks to support our findings too. Code is publicly available at https://github.com/UCSC-REAL/ negative-label-smoothing. and We provide extensive experimental evidences to support our findings. For instance, on multiple benchmark datasets, we present the clear transition of the optimal smoothing rate going from positive to negative when we keep increasing noise rates. In particular, we show a negative smoothing rate elicits higher model confidence on correct predictions and lower confidence on wrong predictions compared with the behavior of a positive one on CIFAR-10 test data.
Researcher Affiliation Academia 1University of California, Santa Cruz 2Brown University 3TML Lab, Sydney AI Centre, The University of Sydney 4RIKEN AIP 5University of Tokyo. Correspondence to: Yang Liu <yangliu@ucsc.edu>.
Pseudocode No The paper describes methods through mathematical formulations and textual explanations, but does not include any pseudocode or algorithm blocks.
Open Source Code Yes Code is publicly available at https://github.com/UCSC-REAL/ negative-label-smoothing.
Open Datasets Yes UCI datasets (Dua & Graff, 2017), CIFAR-10 and CIFAR-100 (Krizhevsky et al., 2009), CIFAR-N (Wei et al., 2022b) and Clothing 1M (Xiao et al., 2015)
Dataset Splits No No specific train/validation/test split proportions or absolute counts are provided in the paper for common datasets like CIFAR. While 'training and validation set' is mentioned, no details on their sizes or split methodology are given.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU models, or cloud instance specifications used for running experiments.
Software Dependencies No The paper mentions optimizers like Adam and SGD, and model architectures like ResNet34, but does not provide specific software dependencies with version numbers (e.g., PyTorch version, Python version, CUDA version).
Experiment Setup Yes We adopted Res Net34 (He et al., 2016), trained for 200 epochs with batch-size 128, SGD (Robbins & Monro, 1951) optimizer with Nesterov momentum of 0.9 and weight decay 1e-4. The learning rate of first 100 epochs is 0.1. Then it multiples with 0.1 for every 50 epochs. and We adopted (Liu & Guo, 2020) a two-layer ReLU Multi-Layer Perceptron (MLP) for classification tasks on multiple UCI datasets, trained for 1000 episodes with batch-size 64 and Adam (Kingma & Ba, 2014) optimizer. We report the best performance for each smooth rate under a set of learning rate settings, [0.0007, 0.001, 0.005, 0.01, 0.05].