Early-Learning Regularization Prevents Memorization of Noisy Labels

Authors: Sheng Liu, Jonathan Niles-Weed, Narges Razavian, Carlos Fernandez-Granda

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We prove that early learning and memorization are fundamental phenomena in high-dimensional classification tasks, even in simple linear models, and give a theoretical explanation in this setting. Motivated by these findings, we develop a new technique for noisy classification tasks... The resulting framework is shown to provide robustness to noisy annotations on several standard benchmarks and real-world datasets, where it achieves results comparable to the state of the art.
Researcher Affiliation Academia Sheng Liu Center for Data Science New York University shengliu@nyu.edu Jonathan Niles-Weed Center for Data Science, and Courant Inst. of Mathematical Sciences New York University jnw@cims.nyu.edu Narges Razavian Department of Population Health, and Department of Radiology NYU School of Medicine narges.razavian@nyulangone.org Carlos Fernandez-Granda Center for Data Science, and Courant Inst. of Mathematical Sciences New York University cfgranda@cims.nyu.edu
Pseudocode No The paper describes the methodology using text and mathematical equations but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Code to reproduce the experiments is publicly available online at https://github.com/shengliu66/ELR.
Open Datasets Yes We evaluate the proposed methodology on two standard benchmarks with simulated label noise, CIFAR-10 and CIFAR-100 [18], and two real-world datasets, Clothing1M [47] and Web Vision [24].
Dataset Splits Yes Table G.1 in the supplementary material reports additional details about the datasets, and our training, validation and test splits.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper does not provide specific software dependency details with version numbers (e.g., Python 3.8, PyTorch 1.9, CUDA 11.1).
Experiment Setup Yes We focus on two variants of the proposed approach: ELR with temporal ensembling, which we call ELR, and ELR with temporal ensembling, weight averaging, two networks, and mixup data augmentation, which we call ELR+ (see Section F).