Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning with Noisy Labels Revisited: A Study Using Real-World Human Annotations

Authors: Jiaheng Wei, Zhaowei Zhu, Hao Cheng, Tongliang Liu, Gang Niu, Yang Liu

ICLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We quantitatively and qualitatively show that real-world noisy labels follow an instance-dependent pattern rather than the classically assumed and adopted ones (e.g., class-dependent label noise). We then initiate an effort to benchmarking a subset of the existing solutions using CIFAR-10N and CIFAR-100N. We further proceed to study the memorization of correct and wrong predictions, which further illustrates the difference between human noise and class-dependent synthetic noise.
Researcher Affiliation	Academia	University of California, Santa Cruz, TML Lab, University of Sydney, RIKEN EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper details experimental procedures and refers to existing algorithms (e.g., Res Net-34, Co-teaching+), but does not present any pseudocode or clearly labeled algorithm blocks within its text.
Open Source Code	Yes	A starter code is provided in https://github.com/UCSC-REAL/cifar-10-100n.
Open Datasets	Yes	This work presents two new benchmark datasets, which we name as CIFAR-10N, CIFAR100N (jointly we call them CIFAR-N), equipping the training datasets of CIFAR-10 and CIFAR-100 with human-annotated real-world noisy labels we collected from Amazon Mechanical Turk. The corresponding datasets and the leaderboard are available at http://noisylabels.com.
Dataset Splits	Yes	CIFAR-10 (Krizhevsky et al., 2009) dataset contains 60k 32 32 color images, 50k images for training and 10k images for testing. CIFAR-100 (Krizhevsky et al., 2009) dataset contains 60K 32 32 color images of 100 ﬁne classes, 50000 images for training and 10000 images for testing.
Hardware Specification	Yes	All our experiments run on a GPU cluster (500 GPUs of all kinds, mainly use 2080 Ti) for training and evaluation.
Software Dependencies	No	The paper mentions using a Res Net-34 model and SGD optimizer, but does not specify software versions for libraries like TensorFlow, PyTorch, or Python, which are necessary for full reproducibility.
Experiment Setup	Yes	The basic hyper-parameters settings for CIFAR-10N and CIFAR-100N are listed as follows: minibatch size (128), optimizer (SGD), initial learning rate (0.1), momentum (0.9), weight decay (0.0005), number of epochs (100) and learning rate decay (0.1 at 50 epochs). Standard data augmentation is applied to each dataset.