reproducibilityindex.ai

Do ImageNet Classifiers Generalize to ImageNet?

Authors: Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, Vaishaal Shankar

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We build new test sets for the CIFAR-10 and Image Net datasets. ... We evaluate a broad range of models and ﬁnd accuracy drops of 3% 15% on CIFAR-10 and 11% 14% on Image Net.
Researcher Affiliation	Academia	1Department of Computer Science, University of California Berkeley, Berkeley, California, USA.
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	To enable future research, we release both our new test sets and the corresponding code.1 ... 1https://github.com/modestyachts/CIFAR-10 .1 and https://github.com/modestyachts/Image N et V2 ... We wrote our own implementations for these models, which we also release publicly.5 ... 5https://github.com/modestyachts/nondeep
Open Datasets	Yes	We decided on CIFAR-10 and Image Net, two of the most widely-used image classiﬁcation benchmarks (Hamner, 2017). Both datasets have been the focus of intense research for almost ten years now. ... (Deng et al., 2009; Krizhevsky, 2009; Russakovsky et al., 2015).
Dataset Splits	Yes	We decided on a test set size of 2,000 for CIFAR-10 and 10,000 for Image Net. ... For Image Net, we repeat the creation process of the validation set because most papers developed and tested models on the validation set. We discuss this point in more detail in Appendix D.1. In the context to this paper, we use the terms validation set and test set interchangeably for Image Net.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers. It mentions using 'code previously published online' for deep architectures and 'own implementations' for others, but no specific library or framework versions.
Experiment Setup	No	The paper does not contain specific experimental setup details such as concrete hyperparameter values, training configurations, or system-level settings for the models evaluated, beyond stating they used pre-trained models or ran training commands from repositories.