Measuring Robustness to Natural Distribution Shifts in Image Classification
Authors: Rohan Taori, Achal Dave, Vaishaal Shankar, Nicholas Carlini, Benjamin Recht, Ludwig Schmidt
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Informed by an evaluation of 204 Image Net models in 213 different test conditions, we find that there is often little to no transfer of robustness from current synthetic to natural distribution shift. |
| Researcher Affiliation | Collaboration | Rohan Taori UC Berkeley Achal Dave CMU Vaishaal Shankar UC Berkeley Nicholas Carlini Google Brain Benjamin Recht UC Berkeley Ludwig Schmidt UC Berkeley |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | We provide our testbed and data as a resource for future work at https://modestyachts.github.io/imagenet-testbed/. |
| Open Datasets | Yes | Image Net [18, 70] is a natural starting point since it has been the focus of intense research efforts over the past decade and a large number of pre-trained classification models, some with robustness interventions, are available for this task. |
| Dataset Splits | No | The paper states: 'A model f is first trained on a fixed training set. We then evaluate this model on two test sets: the standard test set (denoted S1) and the test set with a distribution shift (denoted S2).' While it refers to standard datasets and pre-trained models which would have their own splits, the paper itself does not explicitly define or specify the training, validation, and test splits (e.g., in percentages or sample counts) for its experiments or for the models it evaluates. |
| Hardware Specification | No | The paper does not explicitly describe any specific hardware (e.g., GPU models, CPU types, or cloud instance specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies used in its experiments. |
| Experiment Setup | No | The paper evaluates existing pre-trained models and focuses on their performance under distribution shifts. As such, it does not describe specific training hyperparameters, model initialization, or other system-level training settings for these models within its own experimental setup. |