Evaluating Machine Accuracy on ImageNet

Authors: Vaishaal Shankar, Rebecca Roelofs, Horia Mania, Alex Fang, Benjamin Recht, Ludwig Schmidt

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate a wide range of Image Net models with five trained human labelers. In our year-long experiment, trained humans first annotated 40,000 images from the Image Net and Image Net V2 test sets with multi-class labels to enable a semantically coherent evaluation. Then we measured the classification accuracy of the five trained humans on the full task with 1,000 classes.
Researcher Affiliation Collaboration 1University of California, Berkeley 2Google Brain.
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper does not provide explicit access to source code for the methodology described.
Open Datasets Yes Image Net, the most influential data set in machine learning, has helped to shape the landscape of machine learning research since its release in 2009 (Deng et al., 2009; Russakovsky et al., 2015). The images are drawn from both the original Image Net validation set and the Image Net V2 replication study of Recht et al. (2019).
Dataset Splits Yes Labelers A, B, and C provided multi-label annotations for a subset of size 20,000 from the Image Net validation set and 20,683 images from all three Image Net V2 test sets collected by Recht et al. (2019). ... All training was carried out using a the 30,000 Image Net validation images that would not be used for the final evaluation.
Hardware Specification No The paper does not specify any hardware used for running its experiments.
Software Dependencies No The paper does not list any software dependencies with specific version numbers.
Experiment Setup Yes All training was carried out using a the 30,000 Image Net validation images that would not be used for the final evaluation. ... The only resources the labelers had access to during evaluation were 100 randomly sampled images from the Image Net training set for each class, and the labeling guide. ... The participants spent a median of 26 seconds per image, with a median labeling time of 36 hours for the entire labeling task.