reproducibilityindex.ai

Evaluating Machine Accuracy on ImageNet

Authors: Vaishaal Shankar, Rebecca Roelofs, Horia Mania, Alex Fang, Benjamin Recht, Ludwig Schmidt

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate a wide range of Image Net models with ﬁve trained human labelers. In our year-long experiment, trained humans ﬁrst annotated 40,000 images from the Image Net and Image Net V2 test sets with multi-class labels to enable a semantically coherent evaluation. Then we measured the classiﬁcation accuracy of the ﬁve trained humans on the full task with 1,000 classes.
Researcher Affiliation	Collaboration	1University of California, Berkeley 2Google Brain.
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide explicit access to source code for the methodology described.
Open Datasets	Yes	Image Net, the most inﬂuential data set in machine learning, has helped to shape the landscape of machine learning research since its release in 2009 (Deng et al., 2009; Russakovsky et al., 2015). The images are drawn from both the original Image Net validation set and the Image Net V2 replication study of Recht et al. (2019).
Dataset Splits	Yes	Labelers A, B, and C provided multi-label annotations for a subset of size 20,000 from the Image Net validation set and 20,683 images from all three Image Net V2 test sets collected by Recht et al. (2019). ... All training was carried out using a the 30,000 Image Net validation images that would not be used for the ﬁnal evaluation.
Hardware Specification	No	The paper does not specify any hardware used for running its experiments.
Software Dependencies	No	The paper does not list any software dependencies with specific version numbers.
Experiment Setup	Yes	All training was carried out using a the 30,000 Image Net validation images that would not be used for the ﬁnal evaluation. ... The only resources the labelers had access to during evaluation were 100 randomly sampled images from the Image Net training set for each class, and the labeling guide. ... The participants spent a median of 26 seconds per image, with a median labeling time of 36 hours for the entire labeling task.