From ImageNet to Image Classification: Contextualizing Progress on Benchmarks
Authors: Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Andrew Ilyas, Aleksander Madry
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We develop a methodology for obtaining fine-grained data annotations via large-scale human studies. These annotations allow us to precisely quantify ways in which typical object recognition benchmarks fall short of capturing the underlying ground truth. We then study how such benchmark-task misalignment impacts state-of-the-art models after all, models are often developed by treating existing datasets as the ground truth. We focus our exploration on the Image Net dataset (Deng et al., 2009) (specifically, the ILSVRC2012 object recognition task (Russakovsky et al., 2015)). |
| Researcher Affiliation | Academia | Dimitris Tsipras * 1 Shibani Santurkar * 1 Logan Engstrom 1 Andrew Ilyas 1 Aleksander M adry 1 1EECS, MIT. Correspondence to: DT <tsipras@mit.edu>, SS <shibani@mit.edu>, LE <engstrom@mit.edu>, AI <ailyas@mit.edu>, AM <madry@mit.edu>. |
| Pseudocode | No | The paper describes its pipeline in text and with a flowchart (Figure 2) but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | We release our refined Image Net annotations at https://github.com/Madry Lab/Image Net Multi Label. This link provides the refined dataset annotations, not the source code for the methodology described in the paper. |
| Open Datasets | Yes | We focus our exploration on the Image Net dataset (Deng et al., 2009) (specifically, the ILSVRC2012 object recognition task (Russakovsky et al., 2015)). |
| Dataset Splits | Yes | For our analysis, we use 10,000 images from the Image Net validation set i.e., 10 randomly selected images per class. Note that since the Image Net training and validation sets were created using the same procedure, analyzing the latter is sufficient to understand systematic issues in that dataset. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments or analysis. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | No | The paper describes its annotation and analysis pipeline but does not provide specific experimental setup details such as hyperparameter values, training configurations, or system-level settings for reproducing model training or their analysis steps. |