Robust Training under Label Noise by Over-parameterization

Authors: Sheng Liu, Zhihui Zhu, Qing Qu, Chong You

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally demonstrate the effectiveness of our proposed SOP method on datasets with both synthetic (i.e., CIFAR-10 and CIFAR-100) and realistic (i.e., CIFAR-N, Clothing-1M, and Web Vision) label noise.
Researcher Affiliation Collaboration Sheng Liu 1 Zhihui Zhu 2 Qing Qu 3 Chong You 4 1Center for Data Science, New York University 2Electrical and Computer Engineering, University of Denver 3Department of EECS, University of Michigan 4Google Research, New York City.
Pseudocode Yes Algorithm 1 Image classification under label noise by the method of Sparse Over-Parameterization (SOP).
Open Source Code Yes Code is available at https: //github.com/shengliu66/SOP.
Open Datasets Yes Dataset descriptions. We use datasets with synthetic label noise generated from CIFAR-10 and CIFAR-100 datasets (Krizhevsky et al., 2009). ... For datasets with realistic label noise, we test on CIFAR-10N/CIFAR-100N (Wei et al., 2021b) ... We also test on Clothing1M (Xiao et al., 2015) ... Finally, we also test on the mini Web Vision dataset (Li et al., 2017)
Dataset Splits Yes Each dataset [CIFAR-10/100] contains 50k training images and 10k test images... Clothing-1M contains 1 million training images, 15k validation images, and 10k test images with clean labels.
Hardware Specification Yes Finally, we compare the training time (on a single Nvidia V100 GPU) of our method to the baseline methods in Table 5.
Software Dependencies Yes We implement our method with Py Torch v1.7.
Experiment Setup Yes Network structures & hyperparameters. We implement our method with Py Torch v1.7. For each dataset, the choices of network architectures and hyperparameters for SOP are as follows. Additional details, as well as hyper-parameters for both SOP and SOP+, can be found in Appendix A.4. ... We follow (Liu et al., 2020) to use Res Net-34 and Pre Act Res Net18 architectures trained with SGD using a 0.9 momentum. The initial learning rate is 0.02 decayed with a factor of 10 at the 40th and 80th epochs... Weight decay for network parameters θ is set to 5 10 4. No weight decay is used for parameters {ui, vi}N i=1.