reproducibilityindex.ai

Robustness to Label Noise Depends on the Shape of the Noise Distribution

Authors: Diane Oyen, Michal Kucer, Nicolas Hengartner, Har Simrat Singh

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate theoretically and empirically that classification is generally robust to uniform and class-dependent label noise until the scale of the noise exceeds a threshold that depends on the spread" of the noise distribution; but that beyond this tipping point, classification accuracy declines rapidly. Yet, we also demonstrate that such robustness to label noise is misleading; because our introduction of feature-dependent label noise shows that classification accuracy can be lowered significantly even for small amounts of label noise. We evaluate, for the first time, the damaging effect of feature-dependent label noise on recent strategies for mitigating label noise.
Researcher Affiliation	Academia	Diane Oyen Los Alamos National Lab doyen@lanl.gov Michal Kucer Los Alamos National Lab Nick Hengartner Los Alamos National Lab Har Simrat Singh Los Alamos National Lab
Pseudocode	No	The paper includes mathematical definitions and equations but does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	All code will be made available as open-source.
Open Datasets	Yes	We use the classification benchmarks CIFAR-10 and CIFAR-100 of 32x32-pixel color images in 10 or 100 classes, with 60,000 images per dataset [9].
Dataset Splits	Yes	There are 100 samples per class in the training set and 100 samples per class in the test set.
Hardware Specification	No	The paper does not specify any particular hardware components (e.g., GPU models, CPU types, or memory) used for running the experiments.
Software Dependencies	No	The paper mentions the use of a 'neural network with 2 hidden layers' and 'Res Net-32' as base architectures but does not list specific software libraries or frameworks with their version numbers.
Experiment Setup	Yes	There are 100 samples per class in the training set and 100 samples per class in the test set. The noise level ϵ varies from 0 to 1 in 0.1 increments. A neural network with 2 hidden layers is trained; and further details of the architecture is given in the Supplement. The model is trained 5 times starting from a different random seed; with the mean and standard deviation of the accuracies reported. For all methods the base architecture is Res Net-32 [6] with more details including computational costs, and extended empirical results, in the Supplement.