Provably Adversarially Robust Detection of Out-of-Distribution Data (Almost) for Free

Authors: Alexander Meinke, Julian Bitterwolf, Matthias Hein

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide experiments on CIFAR10, CIFAR100 [29] and Restricted Imagenet (R.Img Net) [53]. and Table 2: OOD performance: For all models we report accuracy on the test set of the in-distribution and AUCs, guaranteed AUCs (GAUC), adversarial AUCs (AAUC) for different test out-distributions.
Researcher Affiliation Academia Alexander Meinke University of Tübingen Tübingen AI Center Julian Bitterwolf University of Tübingen Tübingen AI Center Matthias Hein University of Tübingen Tübingen AI Center
Pseudocode No The paper provides mathematical equations and descriptions of the model and training process, but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes We provide code for all our experiments. and https://github.com/Alex-Meinke/Provable-OOD-Detection
Open Datasets Yes We provide experiments on CIFAR10, CIFAR100 [29] and Restricted Imagenet (R.Img Net) [53]. and For the training out-distribution, we could follow previous work and use 80M Tiny Images [52] for CIFAR10/100... we choose to use Open Images [30] as training OOD instead. and For OOD evaluation for CIFAR10/100 we use the test sets from CIFAR100/10, SVHN [43], the classroom category of downscaled LSUN [55] (LSUN_CR)... For R.Img Net we use Flowers [45], FGVC Aircraft [38], Stanford Cars [26] and smooth noise as test out-distributions.
Dataset Splits No For OOD evaluation for CIFAR10/100 we use the test sets from CIFAR100/10, SVHN [43], the classroom category of downscaled LSUN [55] (LSUN_CR) as well as smooth noise as suggested in [20] and described in App. D. For R.Img Net we use Flowers [45], FGVC Aircraft [38], Stanford Cars [26] and smooth noise as test out-distributions. Since the computation of adversarial AUCs (next paragraph) requires computationally expensive adversarial attacks, we restrict the evaluation on the out-distribution to a fixed subset of 1000 images (300 in the case of LSUN_CR) for the CIFAR experiments and 400 for the R.Img Net models. We still use the entire test set for the in-distribution. The paper discusses which datasets are used for training and testing, and mentions subsets for OOD evaluation, but does not provide explicit overall training/validation/test split percentages or fixed sizes for a dataset.
Hardware Specification No All schedules, hardware and hyperparameters are described in App. D. The main body of the paper does not contain explicit hardware specifications like specific GPU or CPU models.
Software Dependencies No The paper states 'All schedules, hardware and hyperparameters are described in App. D.' but does not list specific software dependencies with version numbers in the main text.
Experiment Setup Yes All schedules, hardware and hyperparameters are described in App. D. and We train the binary discriminator between in-and outdistribution using the loss in Eq. (5) with the bounds over an l -ball of radius ϵ = 0.01 for the out-distribution following [8]. and We train several Proo D models for binary shifts in {0, 1, 2, 3, 4, 5, 6} and We use APGD [13] (except on RImg Net, due to a memory leak) with 500 iterations and 5 random restarts. We also use a 200-step PGD attack with momentum of 0.9 and backtracking that starts with a step size of 0.1 which is halved every time a gradient step does not increase the confidence and gets multiplied by 1.1 otherwise.