Model-Agnostic Adversarial Detection by Random Perturbations
Authors: Bo Huang, Yi Wang, Wei Wang
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluations are performed on the MNIST, CIFAR10 and Image Net datasets. The results demonstrate that our detection method is effective and resilient against various attacks including black-box attacks and the powerful CW attack with four adversarial adaptations. |
| Researcher Affiliation | Academia | 1Dongguan University of Technology, Dongguan, China 2Shenzhen University, Shenzhen, China 3The University of New South Wales, Sydney, Australia |
| Pseudocode | No | The paper describes the steps of the approach in paragraph format in Section 3.1 "Main Steps" but does not provide structured pseudocode or an algorithm block. |
| Open Source Code | No | The paper states "Our implementations are based on the Cleverhans2.0 library 2. https://github.com/tensorflow/cleverhans", but this refers to a third-party library used for attacks, not the authors' own source code for their proposed methodology. |
| Open Datasets | Yes | We evaluate the performance of our approach on detecting adversarial examples for the task of image classification over three benchmark datasets: MNIST, CIFAR-10, and Image Net. |
| Dataset Splits | Yes | For MNIST and CIFAR-10, we used the designated training set for training and the designated test set for testing. For Image Net, we used a pretrained DNN classifier and the first 10, 000 samples of validation set as our test examples for evaluation. We regard adversarial examples as the positive class and natural images as the negative class, and randomly select 80% of samples from each class to train the detector classifier, and use the remaining 20% for test. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU/CPU models or cloud instances. |
| Software Dependencies | Yes | Our implementations are based on the Cleverhans2.0 library 2. |
| Experiment Setup | Yes | We apply a random perturbation η drawn i.i.d. from the Gaussian distribution N(0, diag(σ)), and measure the relative score difference for ˆc as rˆc = F(x)[ˆc] F(x + η)[ˆc] F(x)[ˆc] To account for the stochastic nature of such raw signals, we decide to repeat the process m times and extract statistically robust feature from such sampled distribution. For example, we extract an 17-dimensional feature vector by taking the 10%, 15%, 20%, . . . , 90% quantiles of m samples so that it can be more robust to noise and outliers. We then train a binary classifier 1 for the adversarial example detection. We use an SVM (with RBF kernel) classifier in our experiments. Here, κ = 2.0537 for m = 50 and σ = 0.05 for CIFAR-10. We regard adversarial examples as the positive class and natural images as the negative class, and randomly select 80% of samples from each class to train the detector classifier, and use the remaining 20% for test. |