reproducibilityindex.ai

A law of adversarial risk, interpolation, and label noise

Authors: Daniel Paleka, Amartya Sanyal

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide theoretical and empirical evidence that uniform label noise is more harmful than typical real-world label noise. Finally, we show how inductive biases amplify the effect of label noise and argue the need for future work in this direction. ... We also run experiments in Figure 3, showing that mistakes done by human labelers are more benign than the same rate of uniform noise. ... Figures 3a and 3b show that, for both CIFAR10 and CIFAR100, uniform label noise is indeed worse for adversarial risk than human-generated label noise. For CIFAR-10, the model that interpolates human-generated label noise is almost as robust as the model trained on clean data. This supports our argument that real-world label noise is more benign, for adversarial risk, than uniform label noise.
Researcher Affiliation	Academia	Daniel Paleka ETH Zurich daniel.paleka@inf.ethz.ch Amartya Sanyal ETH AI Center, ETH Zurich amartya.sanyal@ai.ethz.ch
Pseudocode	No	The paper contains theoretical proofs, mathematical derivations, and descriptions of experimental setups, but it does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement about providing open-source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets	Yes	Figures 2 and 3 refer to experiments on 'MNIST', 'CIFAR10', and 'CIFAR100' datasets. These are well-known public datasets. Additionally, it references 'CIFAR10/100-n' and cites 'Wei et al. (2022)' as the source for human-annotated labels, indicating public availability of this dataset variant.
Dataset Splits	No	The paper mentions '50000 MNIST training samples' and 'train Res Net34 models till interpolation on these two datasets', but it does not specify explicit training, validation, or test dataset splits by percentage, sample count, or by referencing predefined standard splits with proper attribution (e.g., '80/10/10 split', or 'standard CIFAR-10 splits').
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU types, or memory specifications. It only describes the models trained and the optimization process.
Software Dependencies	No	The paper mentions using 'Res Net34 models' and training with the 'ADAM optimizer', but it does not specify version numbers for any software components, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We train a one-hidden layer MLP with 1000 hidden units using the ADAM optimizer with a learning rate of 0.01. The decision boundary after running this for 350 epochs with a batch size of 20 is plotted in Appendix G.