Noise or Signal: The Role of Image Backgrounds in Object Recognition

Authors: Kai Yuanqing Xiao, Logan Engstrom, Andrew Ilyas, Aleksander Madry

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We assess the tendency of state-of-the-art object recognition models to depend on signals from image backgrounds. We create a toolkit for disentangling foreground and background signal on Image Net images, and find that (a) models can achieve non-trivial accuracy by relying on the background alone, (b) models often misclassify images even in the presence of correctly classified foregrounds up to 88% of the time with adversarially chosen backgrounds, and (c) more accurate models tend to depend on backgrounds less.
Researcher Affiliation Academia Kai Xiao, Logan Engstrom, Andrew Ilyas, Aleksander M adry MIT {kaix,engstrom,ailyas,madry}@mit.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The code and datasets are publicly available for others to use in this repository: https://github.com/MadryLab/backgrounds_challenge.
Open Datasets Yes Base dataset: Image Net-9. We organize a subset of Image Net into a new dataset with nine coarse-grained classes and call it Image Net-9 (IN-9) 2. To create it, we group together Image Net classes sharing an ancestor in the Word Net (Miller, 1995) hierarchy. We use coarse-grained classes because there are not enough images with annotated bounding boxes (which we need to disentangle backgrounds and foregrounds) to use the standard labels. The resulting IN-9 dataset is class-balanced and has 45405 training images and 4050 testing images. Larger dataset: IN-9L We finally create a dataset called IN-9L that consists of all the images in Image Net corresponding to the classes in ORIGINAL (rather than just the images that have associated bounding boxes). This dataset has about 180k training images in total.
Dataset Splits No The paper provides details for training and testing splits ('45405 training images and 4050 testing images') but does not explicitly mention a distinct validation dataset split or its size for their experiments.
Hardware Specification No The paper does not specify any particular hardware (e.g., GPU models, CPU types, or memory) used for running its experiments.
Software Dependencies No The paper mentions 'Open CV' as an implemented tool for Grab Cut but does not provide a specific version number for Open CV or any other software dependencies crucial for replication.
Experiment Setup Yes For all models, we use fairly standard training settings for Image Net-style models. We train for 200 epochs using SGD with a batch size of 256, a learning rate of 0.1 (with learning rate drops every 50 epochs), a momentum parameter of 0.9, a weight decay of 1e 4, and data augmentation (random resized crop, random horizontal flip, and color jitter). Unless specified, we always use a standard Res Net-50 architecture (He et al., 2016).