Improving generalization by controlling label-noise information in neural network weights
Authors: Hrayr Harutyunyan, Kyle Reing, Greg Ver Steeg, Aram Galstyan
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate the effectiveness of our approach on versions of MNIST, CIFAR-10, and CIFAR-100 corrupted with various noise models, and on a large-scale dataset Clothing1M that has noisy labels. We show that methods based on gradient prediction yield drastic improvements over standard training algorithms (like cross-entropy loss), and outperform competitive approaches designed for learning with noisy labels. |
| Researcher Affiliation | Academia | Hrayr Harutyunyan 1 Kyle Reing 1 Greg Ver Steeg 1 Aram Galstyan 1 1Information Sciences Institute, University of Southern California, Marina del Rey, CA 90292. |
| Pseudocode | Yes | The pseudocode of LIMIT is presented in the supplementary material (Alg. 1). |
| Open Source Code | Yes | The implementation of the proposed method and the code for replicating the experiments is available at https://github. com/hrayrhar/limit-label-memorization. |
| Open Datasets | Yes | We illustrate the effectiveness of our approach on versions of MNIST, CIFAR-10, and CIFAR-100 corrupted with various noise models, and on a large-scale dataset Clothing1M that has noisy labels. |
| Dataset Splits | Yes | We split the 60K images of MNIST into training and validation sets, containing 48K and 12K samples respectively. We split the 50K images of CIFAR-10 into training and validation sets, containing 40K and 10K samples respectively. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory specifications) used for running experiments were mentioned in the paper. |
| Software Dependencies | No | The paper mentions using ADAM optimizer and ResNet-34 networks, but does not provide specific version numbers for software dependencies like Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | We train all baselines except DMI using the ADAM optimizer (Kingma & Ba, 2014) with learning rate = 10 3 and β1 = 0.9. As DMI is very sensitive to the learning rate, we tune it by choosing the best from the following grid of values {10 3, 10 4, 10 5, 10 6}. |