From ImageNet to Image Classification: Contextualizing Progress on Benchmarks

Authors: Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Andrew Ilyas, Aleksander Madry

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We develop a methodology for obtaining fine-grained data annotations via large-scale human studies. These annotations allow us to precisely quantify ways in which typical object recognition benchmarks fall short of capturing the underlying ground truth. We then study how such benchmark-task misalignment impacts state-of-the-art models after all, models are often developed by treating existing datasets as the ground truth. We focus our exploration on the Image Net dataset (Deng et al., 2009) (specifically, the ILSVRC2012 object recognition task (Russakovsky et al., 2015)).
Researcher Affiliation Academia Dimitris Tsipras * 1 Shibani Santurkar * 1 Logan Engstrom 1 Andrew Ilyas 1 Aleksander M adry 1 1EECS, MIT. Correspondence to: DT <tsipras@mit.edu>, SS <shibani@mit.edu>, LE <engstrom@mit.edu>, AI <ailyas@mit.edu>, AM <madry@mit.edu>.
Pseudocode No The paper describes its pipeline in text and with a flowchart (Figure 2) but does not include structured pseudocode or algorithm blocks.
Open Source Code No We release our refined Image Net annotations at https://github.com/Madry Lab/Image Net Multi Label. This link provides the refined dataset annotations, not the source code for the methodology described in the paper.
Open Datasets Yes We focus our exploration on the Image Net dataset (Deng et al., 2009) (specifically, the ILSVRC2012 object recognition task (Russakovsky et al., 2015)).
Dataset Splits Yes For our analysis, we use 10,000 images from the Image Net validation set i.e., 10 randomly selected images per class. Note that since the Image Net training and validation sets were created using the same procedure, analyzing the latter is sufficient to understand systematic issues in that dataset.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments or analysis.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup No The paper describes its annotation and analysis pipeline but does not provide specific experimental setup details such as hyperparameter values, training configurations, or system-level settings for reproducing model training or their analysis steps.