Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Understanding Intrinsic Robustness Using Label Uncertainty
Authors: Xiao Zhang, David Evans
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the CIFAR-10 and CIFAR-10H (Peterson et al., 2019) datasets demonstrate that error regions induced by state-of-the-art classification models all have high label uncertainty (Section 6.1), which validates the proposed label uncertainty constrained concentration problem. |
| Researcher Affiliation | Academia | Xiao Zhang Department of Computer Science University of Virginia EMAIL David Evans Department of Computer Science University of Virginia EMAIL |
| Pseudocode | Yes | Algorithm 1 in Appendix D gives pseudocode for the search algorithm. |
| Open Source Code | Yes | An implementation of our method, and code for reproducing our experiments, is available under an open source license from: https://github.com/xiaozhanguva/intrinsic_rob_lu. |
| Open Datasets | Yes | We conduct experiments on the CIFAR-10H dataset (Peterson et al., 2019), which contains soft labels reflecting human perceptual uncertainty for the 10,000 CIFAR-10 test images (Krizhevsky & Hinton, 2009)... all of the datasets we use are publicly available. |
| Dataset Splits | Yes | For Figure 4, we first conduct a 50/50 train-test split over the 10, 000 CIFAR-10 test images (see Appendix E for experimental details). |
| Hardware Specification | Yes | All of our experiments are conducted using a GPU server with a NVIDIA Ge Force RTX 2080 Ti Graphics card. |
| Software Dependencies | No | The paper describes software components like Adam optimizer, SGD optimizer, and ResNet architectures, but it does not specify exact version numbers for any software dependencies. |
| Experiment Setup | Yes | For standard trained classifiers, we implemented five neural network architecture... We trained the small and large model using a Adam optimizer with initial learning rate 0.005, whereas we trained the resnet18, resnet50 and wideresnet model using a SGD optimizer with initial learning rate 0.01. All models are trained using a piece-wise learning rate schedule with a decaying factor of 10 at epoch 50 and epoch 75, respectively. |