Learning with Noisy Labels Revisited: A Study Using Real-World Human Annotations

Authors: Jiaheng Wei, Zhaowei Zhu, Hao Cheng, Tongliang Liu, Gang Niu, Yang Liu

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We quantitatively and qualitatively show that real-world noisy labels follow an instance-dependent pattern rather than the classically assumed and adopted ones (e.g., class-dependent label noise). We then initiate an effort to benchmarking a subset of the existing solutions using CIFAR-10N and CIFAR-100N. We further proceed to study the memorization of correct and wrong predictions, which further illustrates the difference between human noise and class-dependent synthetic noise.
Researcher Affiliation Academia University of California, Santa Cruz, TML Lab, University of Sydney, RIKEN {jiahengwei,zwzhu,haocheng,yangliu}@ucsc.edu, tongliang.liu@sydney.edu.au, gang.niu.ml@gmail.com
Pseudocode No The paper details experimental procedures and refers to existing algorithms (e.g., Res Net-34, Co-teaching+), but does not present any pseudocode or clearly labeled algorithm blocks within its text.
Open Source Code Yes A starter code is provided in https://github.com/UCSC-REAL/cifar-10-100n.
Open Datasets Yes This work presents two new benchmark datasets, which we name as CIFAR-10N, CIFAR100N (jointly we call them CIFAR-N), equipping the training datasets of CIFAR-10 and CIFAR-100 with human-annotated real-world noisy labels we collected from Amazon Mechanical Turk. The corresponding datasets and the leaderboard are available at http://noisylabels.com.
Dataset Splits Yes CIFAR-10 (Krizhevsky et al., 2009) dataset contains 60k 32 32 color images, 50k images for training and 10k images for testing. CIFAR-100 (Krizhevsky et al., 2009) dataset contains 60K 32 32 color images of 100 fine classes, 50000 images for training and 10000 images for testing.
Hardware Specification Yes All our experiments run on a GPU cluster (500 GPUs of all kinds, mainly use 2080 Ti) for training and evaluation.
Software Dependencies No The paper mentions using a Res Net-34 model and SGD optimizer, but does not specify software versions for libraries like TensorFlow, PyTorch, or Python, which are necessary for full reproducibility.
Experiment Setup Yes The basic hyper-parameters settings for CIFAR-10N and CIFAR-100N are listed as follows: minibatch size (128), optimizer (SGD), initial learning rate (0.1), momentum (0.9), weight decay (0.0005), number of epochs (100) and learning rate decay (0.1 at 50 epochs). Standard data augmentation is applied to each dataset.