Normalized Loss Functions for Deep Learning with Noisy Labels

Authors: Xingjun Ma, Hanxun Huang, Yisen Wang, Simone Romano, Sarah Erfani, James Bailey

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on benchmark datasets demonstrate that the family of new loss functions created by our APL framework can consistently outperform state-of-the-art methods by large margins, especially under large noise rates such as 60% or 80% incorrect labels. We empirically demonstrate that the family of new loss functions created following our APL framework can outperform the state-of-the-art methods by considerable margins, especially under large noise rates of 60% or 80%.
Researcher Affiliation Academia 1The University of Melbourne, Australia 2Shanghai Jiao Tong University, China.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described. There are no specific repository links or explicit code release statements.
Open Datasets Yes In this section, we empirically investigate our proposed APL loss functions on benchmark datasets MNIST (Le Cun et al., 1998), CIFAR-10/-100 (Krizhevsky & Hinton, 2009), and a real-world noisy dataset Web Vision (Li et al., 2017a).
Dataset Splits Yes We test the combinations between α {0.1, 1.0, 10.0} and β {0.1, 1.0, 10.0, 100.0}, then select the optimal combination according to the validation accuracy on a randomly sampled validation set (20% training data). We evaluate the trained networks on the same 50 classes of the ILSVRC12 validation set, which can be considered as a clean validation.
Hardware Specification No The paper mentions 'LIEF HPC-GPU Facility' but does not specify exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications used for running its experiments.
Software Dependencies No The paper mentions using 'SGD optimizer' and model architectures like 'Res Net-34', 'Res Net-50', '8-layer CNN', but does not provide specific ancillary software details with version numbers (e.g., library or solver names like Python 3.8, PyTorch 1.9, or specific CUDA versions).
Experiment Setup Yes We train the networks for 50, 120 and 200 epochs for MNIST, CIFAR-10, and CIFAR-100, respectively. For all the training, we use SGD optimizer with momentum 0.9 and cosine learning rate annealing. Weight decay is set to 1 10 3, 1 10 4 and 1 10 5 for MNIST, CIFAR-10 and CIFAR-100, respectively. The initial learning rate is set to 0.01 for MNIST/CIFAR-10 and 0.1 for CIFAR-100. For GCE, we set ρ = 0.7. For SCE, we set A = 4, and α = 0.01, β = 1.0 for MNIST, α = 0.1, β = 1.0 for CIFAR-10, α = 6.0, β = 0.1 for CIFAR-100. For FL, we set γ = 0.5. For our APL losses, we empirically set α = 1, β = 100 for MNIST, α, β = 1 for CIFAR-10, and α = 10, β = 0.1 for CIFAR-100. For GCE, we use the suggested α = 0.7, while for SCE, we use the setting with A = 4, α = 10.0, β = 1.0. For our two APL losses, we set α = 50.0, β = 0.1 for NCE+RCE and α = 50.0, β = 1.0 for NCE+MAE.