Normalized Loss Functions for Deep Learning with Noisy Labels
Authors: Xingjun Ma, Hanxun Huang, Yisen Wang, Simone Romano, Sarah Erfani, James Bailey
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on benchmark datasets demonstrate that the family of new loss functions created by our APL framework can consistently outperform state-of-the-art methods by large margins, especially under large noise rates such as 60% or 80% incorrect labels. We empirically demonstrate that the family of new loss functions created following our APL framework can outperform the state-of-the-art methods by considerable margins, especially under large noise rates of 60% or 80%. |
| Researcher Affiliation | Academia | 1The University of Melbourne, Australia 2Shanghai Jiao Tong University, China. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. There are no specific repository links or explicit code release statements. |
| Open Datasets | Yes | In this section, we empirically investigate our proposed APL loss functions on benchmark datasets MNIST (Le Cun et al., 1998), CIFAR-10/-100 (Krizhevsky & Hinton, 2009), and a real-world noisy dataset Web Vision (Li et al., 2017a). |
| Dataset Splits | Yes | We test the combinations between α {0.1, 1.0, 10.0} and β {0.1, 1.0, 10.0, 100.0}, then select the optimal combination according to the validation accuracy on a randomly sampled validation set (20% training data). We evaluate the trained networks on the same 50 classes of the ILSVRC12 validation set, which can be considered as a clean validation. |
| Hardware Specification | No | The paper mentions 'LIEF HPC-GPU Facility' but does not specify exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'SGD optimizer' and model architectures like 'Res Net-34', 'Res Net-50', '8-layer CNN', but does not provide specific ancillary software details with version numbers (e.g., library or solver names like Python 3.8, PyTorch 1.9, or specific CUDA versions). |
| Experiment Setup | Yes | We train the networks for 50, 120 and 200 epochs for MNIST, CIFAR-10, and CIFAR-100, respectively. For all the training, we use SGD optimizer with momentum 0.9 and cosine learning rate annealing. Weight decay is set to 1 10 3, 1 10 4 and 1 10 5 for MNIST, CIFAR-10 and CIFAR-100, respectively. The initial learning rate is set to 0.01 for MNIST/CIFAR-10 and 0.1 for CIFAR-100. For GCE, we set ρ = 0.7. For SCE, we set A = 4, and α = 0.01, β = 1.0 for MNIST, α = 0.1, β = 1.0 for CIFAR-10, α = 6.0, β = 0.1 for CIFAR-100. For FL, we set γ = 0.5. For our APL losses, we empirically set α = 1, β = 100 for MNIST, α, β = 1 for CIFAR-10, and α = 10, β = 0.1 for CIFAR-100. For GCE, we use the suggested α = 0.7, while for SCE, we use the setting with A = 4, α = 10.0, β = 1.0. For our two APL losses, we set α = 50.0, β = 0.1 for NCE+RCE and α = 50.0, β = 1.0 for NCE+MAE. |