Do ImageNet Classifiers Generalize to ImageNet?

Authors: Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, Vaishaal Shankar

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We build new test sets for the CIFAR-10 and Image Net datasets. ... We evaluate a broad range of models and find accuracy drops of 3% 15% on CIFAR-10 and 11% 14% on Image Net.
Researcher Affiliation Academia 1Department of Computer Science, University of California Berkeley, Berkeley, California, USA.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes To enable future research, we release both our new test sets and the corresponding code.1 ... 1https://github.com/modestyachts/CIFAR-10 .1 and https://github.com/modestyachts/Image N et V2 ... We wrote our own implementations for these models, which we also release publicly.5 ... 5https://github.com/modestyachts/nondeep
Open Datasets Yes We decided on CIFAR-10 and Image Net, two of the most widely-used image classification benchmarks (Hamner, 2017). Both datasets have been the focus of intense research for almost ten years now. ... (Deng et al., 2009; Krizhevsky, 2009; Russakovsky et al., 2015).
Dataset Splits Yes We decided on a test set size of 2,000 for CIFAR-10 and 10,000 for Image Net. ... For Image Net, we repeat the creation process of the validation set because most papers developed and tested models on the validation set. We discuss this point in more detail in Appendix D.1. In the context to this paper, we use the terms validation set and test set interchangeably for Image Net.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers. It mentions using 'code previously published online' for deep architectures and 'own implementations' for others, but no specific library or framework versions.
Experiment Setup No The paper does not contain specific experimental setup details such as concrete hyperparameter values, training configurations, or system-level settings for the models evaluated, beyond stating they used pre-trained models or ran training commands from repositories.