reproducibilityindex.ai

From ImageNet to Image Classification: Contextualizing Progress on Benchmarks

Authors: Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Andrew Ilyas, Aleksander Madry

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We develop a methodology for obtaining ﬁne-grained data annotations via large-scale human studies. These annotations allow us to precisely quantify ways in which typical object recognition benchmarks fall short of capturing the underlying ground truth. We then study how such benchmark-task misalignment impacts state-of-the-art models after all, models are often developed by treating existing datasets as the ground truth. We focus our exploration on the Image Net dataset (Deng et al., 2009) (speciﬁcally, the ILSVRC2012 object recognition task (Russakovsky et al., 2015)).
Researcher Affiliation	Academia	Dimitris Tsipras * 1 Shibani Santurkar * 1 Logan Engstrom 1 Andrew Ilyas 1 Aleksander M adry 1 1EECS, MIT. Correspondence to: DT <tsipras@mit.edu>, SS <shibani@mit.edu>, LE <engstrom@mit.edu>, AI <ailyas@mit.edu>, AM <madry@mit.edu>.
Pseudocode	No	The paper describes its pipeline in text and with a flowchart (Figure 2) but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	We release our reﬁned Image Net annotations at https://github.com/Madry Lab/Image Net Multi Label. This link provides the refined dataset annotations, not the source code for the methodology described in the paper.
Open Datasets	Yes	We focus our exploration on the Image Net dataset (Deng et al., 2009) (speciﬁcally, the ILSVRC2012 object recognition task (Russakovsky et al., 2015)).
Dataset Splits	Yes	For our analysis, we use 10,000 images from the Image Net validation set i.e., 10 randomly selected images per class. Note that since the Image Net training and validation sets were created using the same procedure, analyzing the latter is sufﬁcient to understand systematic issues in that dataset.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments or analysis.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	No	The paper describes its annotation and analysis pipeline but does not provide specific experimental setup details such as hyperparameter values, training configurations, or system-level settings for reproducing model training or their analysis steps.