Evaluating Machine Accuracy on ImageNet
Authors: Vaishaal Shankar, Rebecca Roelofs, Horia Mania, Alex Fang, Benjamin Recht, Ludwig Schmidt
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate a wide range of Image Net models with five trained human labelers. In our year-long experiment, trained humans first annotated 40,000 images from the Image Net and Image Net V2 test sets with multi-class labels to enable a semantically coherent evaluation. Then we measured the classification accuracy of the five trained humans on the full task with 1,000 classes. |
| Researcher Affiliation | Collaboration | 1University of California, Berkeley 2Google Brain. |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide explicit access to source code for the methodology described. |
| Open Datasets | Yes | Image Net, the most influential data set in machine learning, has helped to shape the landscape of machine learning research since its release in 2009 (Deng et al., 2009; Russakovsky et al., 2015). The images are drawn from both the original Image Net validation set and the Image Net V2 replication study of Recht et al. (2019). |
| Dataset Splits | Yes | Labelers A, B, and C provided multi-label annotations for a subset of size 20,000 from the Image Net validation set and 20,683 images from all three Image Net V2 test sets collected by Recht et al. (2019). ... All training was carried out using a the 30,000 Image Net validation images that would not be used for the final evaluation. |
| Hardware Specification | No | The paper does not specify any hardware used for running its experiments. |
| Software Dependencies | No | The paper does not list any software dependencies with specific version numbers. |
| Experiment Setup | Yes | All training was carried out using a the 30,000 Image Net validation images that would not be used for the final evaluation. ... The only resources the labelers had access to during evaluation were 100 randomly sampled images from the Image Net training set for each class, and the labeling guide. ... The participants spent a median of 26 seconds per image, with a median labeling time of 36 hours for the entire labeling task. |