Early-Learning Regularization Prevents Memorization of Noisy Labels
Authors: Sheng Liu, Jonathan Niles-Weed, Narges Razavian, Carlos Fernandez-Granda
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We prove that early learning and memorization are fundamental phenomena in high-dimensional classification tasks, even in simple linear models, and give a theoretical explanation in this setting. Motivated by these findings, we develop a new technique for noisy classification tasks... The resulting framework is shown to provide robustness to noisy annotations on several standard benchmarks and real-world datasets, where it achieves results comparable to the state of the art. |
| Researcher Affiliation | Academia | Sheng Liu Center for Data Science New York University shengliu@nyu.edu Jonathan Niles-Weed Center for Data Science, and Courant Inst. of Mathematical Sciences New York University jnw@cims.nyu.edu Narges Razavian Department of Population Health, and Department of Radiology NYU School of Medicine narges.razavian@nyulangone.org Carlos Fernandez-Granda Center for Data Science, and Courant Inst. of Mathematical Sciences New York University cfgranda@cims.nyu.edu |
| Pseudocode | No | The paper describes the methodology using text and mathematical equations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code to reproduce the experiments is publicly available online at https://github.com/shengliu66/ELR. |
| Open Datasets | Yes | We evaluate the proposed methodology on two standard benchmarks with simulated label noise, CIFAR-10 and CIFAR-100 [18], and two real-world datasets, Clothing1M [47] and Web Vision [24]. |
| Dataset Splits | Yes | Table G.1 in the supplementary material reports additional details about the datasets, and our training, validation and test splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details with version numbers (e.g., Python 3.8, PyTorch 1.9, CUDA 11.1). |
| Experiment Setup | Yes | We focus on two variants of the proposed approach: ELR with temporal ensembling, which we call ELR, and ELR with temporal ensembling, weight averaging, two networks, and mixup data augmentation, which we call ELR+ (see Section F). |