Improving generalization by controlling label-noise information in neural network weights

Authors: Hrayr Harutyunyan, Kyle Reing, Greg Ver Steeg, Aram Galstyan

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate the effectiveness of our approach on versions of MNIST, CIFAR-10, and CIFAR-100 corrupted with various noise models, and on a large-scale dataset Clothing1M that has noisy labels. We show that methods based on gradient prediction yield drastic improvements over standard training algorithms (like cross-entropy loss), and outperform competitive approaches designed for learning with noisy labels.
Researcher Affiliation Academia Hrayr Harutyunyan 1 Kyle Reing 1 Greg Ver Steeg 1 Aram Galstyan 1 1Information Sciences Institute, University of Southern California, Marina del Rey, CA 90292.
Pseudocode Yes The pseudocode of LIMIT is presented in the supplementary material (Alg. 1).
Open Source Code Yes The implementation of the proposed method and the code for replicating the experiments is available at https://github. com/hrayrhar/limit-label-memorization.
Open Datasets Yes We illustrate the effectiveness of our approach on versions of MNIST, CIFAR-10, and CIFAR-100 corrupted with various noise models, and on a large-scale dataset Clothing1M that has noisy labels.
Dataset Splits Yes We split the 60K images of MNIST into training and validation sets, containing 48K and 12K samples respectively. We split the 50K images of CIFAR-10 into training and validation sets, containing 40K and 10K samples respectively.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory specifications) used for running experiments were mentioned in the paper.
Software Dependencies No The paper mentions using ADAM optimizer and ResNet-34 networks, but does not provide specific version numbers for software dependencies like Python, PyTorch, or TensorFlow.
Experiment Setup Yes We train all baselines except DMI using the ADAM optimizer (Kingma & Ba, 2014) with learning rate = 10 3 and β1 = 0.9. As DMI is very sensitive to the learning rate, we tune it by choosing the best from the following grid of values {10 3, 10 4, 10 5, 10 6}.