Provably Adversarially Robust Detection of Out-of-Distribution Data (Almost) for Free
Authors: Alexander Meinke, Julian Bitterwolf, Matthias Hein
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide experiments on CIFAR10, CIFAR100 [29] and Restricted Imagenet (R.Img Net) [53]. and Table 2: OOD performance: For all models we report accuracy on the test set of the in-distribution and AUCs, guaranteed AUCs (GAUC), adversarial AUCs (AAUC) for different test out-distributions. |
| Researcher Affiliation | Academia | Alexander Meinke University of Tübingen Tübingen AI Center Julian Bitterwolf University of Tübingen Tübingen AI Center Matthias Hein University of Tübingen Tübingen AI Center |
| Pseudocode | No | The paper provides mathematical equations and descriptions of the model and training process, but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We provide code for all our experiments. and https://github.com/Alex-Meinke/Provable-OOD-Detection |
| Open Datasets | Yes | We provide experiments on CIFAR10, CIFAR100 [29] and Restricted Imagenet (R.Img Net) [53]. and For the training out-distribution, we could follow previous work and use 80M Tiny Images [52] for CIFAR10/100... we choose to use Open Images [30] as training OOD instead. and For OOD evaluation for CIFAR10/100 we use the test sets from CIFAR100/10, SVHN [43], the classroom category of downscaled LSUN [55] (LSUN_CR)... For R.Img Net we use Flowers [45], FGVC Aircraft [38], Stanford Cars [26] and smooth noise as test out-distributions. |
| Dataset Splits | No | For OOD evaluation for CIFAR10/100 we use the test sets from CIFAR100/10, SVHN [43], the classroom category of downscaled LSUN [55] (LSUN_CR) as well as smooth noise as suggested in [20] and described in App. D. For R.Img Net we use Flowers [45], FGVC Aircraft [38], Stanford Cars [26] and smooth noise as test out-distributions. Since the computation of adversarial AUCs (next paragraph) requires computationally expensive adversarial attacks, we restrict the evaluation on the out-distribution to a fixed subset of 1000 images (300 in the case of LSUN_CR) for the CIFAR experiments and 400 for the R.Img Net models. We still use the entire test set for the in-distribution. The paper discusses which datasets are used for training and testing, and mentions subsets for OOD evaluation, but does not provide explicit overall training/validation/test split percentages or fixed sizes for a dataset. |
| Hardware Specification | No | All schedules, hardware and hyperparameters are described in App. D. The main body of the paper does not contain explicit hardware specifications like specific GPU or CPU models. |
| Software Dependencies | No | The paper states 'All schedules, hardware and hyperparameters are described in App. D.' but does not list specific software dependencies with version numbers in the main text. |
| Experiment Setup | Yes | All schedules, hardware and hyperparameters are described in App. D. and We train the binary discriminator between in-and outdistribution using the loss in Eq. (5) with the bounds over an l -ball of radius ϵ = 0.01 for the out-distribution following [8]. and We train several Proo D models for binary shifts in {0, 1, 2, 3, 4, 5, 6} and We use APGD [13] (except on RImg Net, due to a memory leak) with 500 iterations and 5 random restarts. We also use a 200-step PGD attack with momentum of 0.9 and backtracking that starts with a step size of 0.1 which is halved every time a gradient step does not increase the confidence and gets multiplied by 1.1 otherwise. |