The Odds are Odd: A Statistical Test for Detecting Adversarial Examples
Authors: Kevin Roth, Yannic Kilcher, Thomas Hofmann
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We justify our approach empirically, but also provide conditions under which detectability via the suggested test statistics is guaranteed to be effective. In our experiments, we show that it is even possible to correct test time predictions for adversarial attacks with high accuracy. We evaluate our method against strong iterative attacks and show that even an adversary aware of the defense cannot evade our detector. E.g. for an L8-PGD white-box attack on CIFAR10, our method achieves a detection rate of 99% (FPR 1%), with accuracies of 96% on clean and 92% on adversarial samples respectively. On Image Net, we achieve a detection rate of 99% (FPR 1%). |
| Researcher Affiliation | Academia | Kevin Roth * 1 Yannic Kilcher * 1 Thomas Hofmann 1 *Equal contribution 1Department of Computer Science, ETH Z urich. Correspondence to: <kevin.roth@inf.ethz.ch>, <yannic.kilcher@inf.ethz.ch>, <thomas.hofmann@inf.ethz.ch>. |
| Pseudocode | No | The paper describes its proposed statistical test and methods in detail, including mathematical formulations and propositions. However, it does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code can be found at https://github.com/yk/icml19_public. |
| Open Datasets | Yes | In this section, we provide experimental support for our theoretical propositions and we benchmark our detection and correction methods on various architectures of deep neural networks trained on the CIFAR10 and Image Net datasets. |
| Dataset Splits | No | Table 1 shows test set accuracies... For CIFAR10, we compare the Wide Res Net implementation of Madry et al. (2017)... While standard datasets like CIFAR10 and ImageNet have predefined splits, the paper itself does not explicitly state the train/validation/test split ratios or counts for reproducibility, nor does it explicitly state using a 'validation' set. It primarily focuses on 'test set' evaluation. |
| Hardware Specification | No | The paper discusses the architectures and datasets used (e.g., Wide Res Net, CNN7, CIFAR10, ImageNet) but does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | For Image Net, we use a selection of models from the torchvision package (Marcel & Rodriguez, 2010). This mentions a package but without a specific version number, and no other software dependencies with versions are listed. |
| Experiment Setup | Yes | As a default attack strategy we use an L8-norm constrained PGD white-box attack. The attack budget 8 was chosen to be the smallest value such that almost all examples are successfully attacked. For CIFAR10 this is 8 8{255, for Image Net 8 2{255. ... For the remainder of this paper, we thus fixed the number of attack iterations to 20. |