Identifying Statistical Bias in Dataset Replication
Authors: Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Jacob Steinhardt, Aleksander Madry
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study Image Net-v2, a replication of the Image Net dataset on which models exhibit a significant (11-14%) drop in accuracy, even after controlling for selection frequency, a human-in-the-loop measure of data quality. We show that after remeasuring selection frequencies and correcting for statistical bias, only an estimated 3.6% 1.5% of the original 11.7% 1.0% accuracy drop remains unaccounted for. |
| Researcher Affiliation | Academia | 1MIT 2UC Berkeley. Correspondence to: Logan Engstrom <engstrom@mit.edu>. |
| Pseudocode | Yes | We provide further detail (including pseudocode) on the fitting process for pi(s(x); θ) in Appendix F. |
| Open Source Code | Yes | Code for our study is publicly available1. 1https://git.io/data-rep-analysis |
| Open Datasets | Yes | Image Net (Deng et al., 2009; Russakovsky et al., 2015) (which we also refer to as Image Net-v1 or v1) is one of the most widely used datasets in computer vision. |
| Dataset Splits | No | The paper refers to pre-existing datasets like ImageNet and ImageNet-v2, and their respective 'test sets', but does not explicitly provide details about training, validation, and test splits (e.g., percentages or sample counts for each split). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper refers to using 'Amazon Mechanical Turk (MTurk)' and mentions 'PyTorch' in a reference's title, but does not provide specific version numbers for any software dependencies used in their experiments. |
| Experiment Setup | Yes | In these tasks, MTurk annotators were shown grids of 48 images at a time, each corresponding to an Image Net class. ... Each image was seen by 40 distinct annotators... We opt to use mixtures of beta distributions as the family pi( ; θ) ... a cubic spline |