Improving robustness against common corruptions by covariate shift adaptation

Authors: Steffen Schneider, Evgenia Rusak, Luisa Eck, Oliver Bringmann, Wieland Brendel, Matthias Bethge

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Replacing the activation statistics estimated by batch normalization on the training set with the statistics of the corrupted images consistently improves the robustness across 25 different popular computer vision models. Using the corrected statistics, ResNet-50 reaches 62.2% mCE on ImageNet-C compared to 76.7% without adaptation. With the more robust Deep Augment+AugMix model, we improve the state of the art achieved by a ResNet50 model up to date from 53.6% mCE to 45.4% mCE.
Researcher Affiliation Academia Steffen Schneider University of Tübingen & IMPRS-IS; Evgenia Rusak University of Tübingen & IMPRS-IS; Luisa Eck LMU Munich; Oliver Bringmann University of Tübingen; Wieland Brendel University of Tübingen; Matthias Bethge University of Tübingen
Pseudocode No No, the paper does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Online version and code: domainadaptation.org/batchnorm
Open Datasets Yes All models are trained on the ILSVRC2012 subset of IN comprised of 1.2 million images in the training and a total of 1000 classes [7, 8]. The ImageNet-C benchmark [IN-C; 2]
Dataset Splits Yes The ImageNet-C benchmark [IN-C; 2] consists of 15 test corruptions and four hold-out corruptions which are applied with five different severity levels to the 50 000 test images of the ILSVRC2012 subset of ImageNet [8]. ImageNet-C [IN-C; 2] is comprised of corrupted versions of the 50 000 images in the IN validation set.
Hardware Specification No No, the paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, or memory amounts) used for running its experiments.
Software Dependencies No No, the paper mentions software components like 'torchvision library' and 'PyTorch' (in reference [10]), but does not provide specific version numbers for these or other key software dependencies.
Experiment Setup Yes For IN, we resize all images to 256x256px and take the center 224x224px crop. For IN-C, images are already cropped. We also center and re-scale the color values with µRGB = [0.485, 0.456, 0.406] and σ = [0.229, 0.224, 0.225]. ... In the ad hoc scenario, we set n = 1... In the full adaptation scenario, we set n = 50 000... In the partial adaptation scenario, we set n = 8... The hyperparameter N controls the trade-off between source and estimated target statistics... We suggest using N ∈ [8, 128] for practical applications with small n < 32. For partial adaptation, we choose N ∈ {20, ..., 210} and select the optimal value on the holdout corruption mCE. All models are adapted using n = 50 000 (vanilla) or n = 4096 (all other models) and N = 0.