reproducibilityindex.ai

Overinterpretation reveals image classification model pathologies

Authors: Brandon Carter, Siddhartha Jain, Jonas W. Mueller, David Gifford

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Here, we demonstrate that neural networks trained on CIFAR-10 and Image Net suffer from overinterpretation, and we find models on CIFAR-10 make confident predictions even when 95% of input images are masked and humans cannot discern salient features in the remaining pixel-subsets. We introduce Batched Gradient SIS, a new method for discovering sufficient input subsets for complex datasets, and use this method to show the sufficiency of border pixels in Image Net for training and testing. We train new classifiers on solely these pixel-subsets of training images and evaluate accuracy on corresponding pixel-subsets of test images to determine whether such pixel-subsets are statistically valid for generalization in the benchmark. Table 1: Accuracy of CIFAR-10 classifiers trained and evaluated on full images, 5% backward selection (BS) pixel-subsets, and 5% random pixel-subsets.
Researcher Affiliation	Collaboration	Brandon Carter MIT CSAIL bcarter@csail.mit.edu Siddhartha Jain Jonas Mueller Amazon Web Services David Gifford MIT CSAIL gifford@mit.edu
Pseudocode	No	The complete Batched Gradient SIS algorithm is presented in Section S1, which is in the supplementary material and not part of the main paper text provided.
Open Source Code	Yes	Code for this paper is available at: https://github.com/gifford-lab/overinterpretation.
Open Datasets	Yes	CIFAR-10 [2] and Image Net [3] have become two of the most popular image classification benchmarks. We also use the CIFAR-10-C dataset [29] to evaluate the extent to which our CIFAR-10 models can generalize to out-of-distribution (OOD) data.
Dataset Splits	Yes	Inception v3 trained on 10% pixel-subsets of Image Net training images achieves 71.4% top-1 accuracy (mean over 5 runs) on the corresponding pixel-subset Image Net validation set (Table S7).
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running experiments were provided in the paper.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., library or solver names like PyTorch with a version) were explicitly stated for replication. Although PyTorch is cited, no version is given in the context of the experiments.
Experiment Setup	Yes	We produce sparse variants of all train and test set images retaining 5% (CIFAR-10) or 10% (Image Net) of pixels in each image. We apply input dropout [43] to both train and test images. We retain each input pixel with probability p = 0.8 and set the values of dropped pixels to zero.