reproducibilityindex.ai

Noise or Signal: The Role of Image Backgrounds in Object Recognition

Authors: Kai Yuanqing Xiao, Logan Engstrom, Andrew Ilyas, Aleksander Madry

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We assess the tendency of state-of-the-art object recognition models to depend on signals from image backgrounds. We create a toolkit for disentangling foreground and background signal on Image Net images, and ﬁnd that (a) models can achieve non-trivial accuracy by relying on the background alone, (b) models often misclassify images even in the presence of correctly classiﬁed foregrounds up to 88% of the time with adversarially chosen backgrounds, and (c) more accurate models tend to depend on backgrounds less.
Researcher Affiliation	Academia	Kai Xiao, Logan Engstrom, Andrew Ilyas, Aleksander M adry MIT {kaix,engstrom,ailyas,madry}@mit.edu
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code and datasets are publicly available for others to use in this repository: https://github.com/MadryLab/backgrounds_challenge.
Open Datasets	Yes	Base dataset: Image Net-9. We organize a subset of Image Net into a new dataset with nine coarse-grained classes and call it Image Net-9 (IN-9) 2. To create it, we group together Image Net classes sharing an ancestor in the Word Net (Miller, 1995) hierarchy. We use coarse-grained classes because there are not enough images with annotated bounding boxes (which we need to disentangle backgrounds and foregrounds) to use the standard labels. The resulting IN-9 dataset is class-balanced and has 45405 training images and 4050 testing images. Larger dataset: IN-9L We ﬁnally create a dataset called IN-9L that consists of all the images in Image Net corresponding to the classes in ORIGINAL (rather than just the images that have associated bounding boxes). This dataset has about 180k training images in total.
Dataset Splits	No	The paper provides details for training and testing splits ('45405 training images and 4050 testing images') but does not explicitly mention a distinct validation dataset split or its size for their experiments.
Hardware Specification	No	The paper does not specify any particular hardware (e.g., GPU models, CPU types, or memory) used for running its experiments.
Software Dependencies	No	The paper mentions 'Open CV' as an implemented tool for Grab Cut but does not provide a specific version number for Open CV or any other software dependencies crucial for replication.
Experiment Setup	Yes	For all models, we use fairly standard training settings for Image Net-style models. We train for 200 epochs using SGD with a batch size of 256, a learning rate of 0.1 (with learning rate drops every 50 epochs), a momentum parameter of 0.9, a weight decay of 1e 4, and data augmentation (random resized crop, random horizontal ﬂip, and color jitter). Unless speciﬁed, we always use a standard Res Net-50 architecture (He et al., 2016).