Robustness to corruption in pre-trained Bayesian neural networks

Authors: Xi Wang, Laurence Aitchison

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using pre-trained HMC samples, Shift Match gives strong performance improvements on CIFAR-10-C, outperforms Emp Cov priors (though Shift Match uses extra information from a minibatch of corrupted test points), and is perhaps the first Bayesian method capable of convincingly outperforming plain deep ensembles.
Researcher Affiliation Academia Xi Wang College of Information and Computer Science University of Massachusetts Amherst xwang3@cs.umass.edu Laurence Aitchison Department of Computer Science University of Bristol laurence.aitchison@bristol.ac.uk
Pseudocode Yes Algorithm 1 End-to-end procedure of Shift Match on a pre-trained BNN
Open Source Code Yes 1Code available at https://github.com/xidulu/Shift Match
Open Datasets Yes First, we applied Shift Match to the HMC samples from Izmailov et al. (2021b) for a large-scale Bayesian Res Net trained on CIFAR-10 and tested on CIFAR-10-C (Hendrycks & Dietterich, 2019)... Second, we show that Shift Match performs better than Emp Cov priors on small CNNs from Izmailov et al. (2021a) trained on MNIST and tested on MNIST-C(Mu & Gilmer, 2019)... Finally, we show that Shift Match can be applied on a large pre-trained non-Bayesian network, where it improved performance on Image Net relative to test-time batchnorm.
Dataset Splits Yes They used a Res Net-20 with only 40,960 of the 50,000 training samples (in order to evenly share the data across the TPU devices ), and to ensure deterministic likelihood evaluations (which is necessary for HMC), turned off data augmentation and data subsampling (i.e. full batch training), and used filter response normalization (FRN) (Singh & Krishnan, 2020) rather than batch normalization (Ioffe & Szegedy, 2015).
Hardware Specification Yes In contrast, in Shift Match, we only do the matrix square roots once after training. For instance, it took us around 0.35s for a forward pass without spatial batchnorm, which contrasts with 0.77s for a forward pass with spatial batchnorm using a mini-batch of 128 inputs for CIFAR-10. ... we can fit a batch of 1000 for Image Net even on a single 2080ti with 11GB of memory
Software Dependencies No The paper mentions software like Keras Applications and PyTorch (implicitly, as it's a deep learning paper), and specific network components like FRN, but does not provide specific version numbers for any of these to ensure reproducibility.
Experiment Setup Yes They used a Res Net-20 with only 40,960 of the 50,000 training samples (in order to evenly share the data across the TPU devices ), and to ensure deterministic likelihood evaluations (which is necessary for HMC), turned off data augmentation and data subsampling (i.e. full batch training), and used filter response normalization (FRN) (Singh & Krishnan, 2020) rather than batch normalization (Ioffe & Szegedy, 2015).