reproducibilityindex.ai

Model-Agnostic Adversarial Detection by Random Perturbations

Authors: Bo Huang, Yi Wang, Wei Wang

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive evaluations are performed on the MNIST, CIFAR10 and Image Net datasets. The results demonstrate that our detection method is effective and resilient against various attacks including black-box attacks and the powerful CW attack with four adversarial adaptations.
Researcher Affiliation	Academia	1Dongguan University of Technology, Dongguan, China 2Shenzhen University, Shenzhen, China 3The University of New South Wales, Sydney, Australia
Pseudocode	No	The paper describes the steps of the approach in paragraph format in Section 3.1 "Main Steps" but does not provide structured pseudocode or an algorithm block.
Open Source Code	No	The paper states "Our implementations are based on the Cleverhans2.0 library 2. https://github.com/tensorﬂow/cleverhans", but this refers to a third-party library used for attacks, not the authors' own source code for their proposed methodology.
Open Datasets	Yes	We evaluate the performance of our approach on detecting adversarial examples for the task of image classiﬁcation over three benchmark datasets: MNIST, CIFAR-10, and Image Net.
Dataset Splits	Yes	For MNIST and CIFAR-10, we used the designated training set for training and the designated test set for testing. For Image Net, we used a pretrained DNN classiﬁer and the ﬁrst 10, 000 samples of validation set as our test examples for evaluation. We regard adversarial examples as the positive class and natural images as the negative class, and randomly select 80% of samples from each class to train the detector classiﬁer, and use the remaining 20% for test.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments, such as GPU/CPU models or cloud instances.
Software Dependencies	Yes	Our implementations are based on the Cleverhans2.0 library 2.
Experiment Setup	Yes	We apply a random perturbation η drawn i.i.d. from the Gaussian distribution N(0, diag(σ)), and measure the relative score difference for ˆc as rˆc = F(x)[ˆc] F(x + η)[ˆc] F(x)[ˆc] To account for the stochastic nature of such raw signals, we decide to repeat the process m times and extract statistically robust feature from such sampled distribution. For example, we extract an 17-dimensional feature vector by taking the 10%, 15%, 20%, . . . , 90% quantiles of m samples so that it can be more robust to noise and outliers. We then train a binary classiﬁer 1 for the adversarial example detection. We use an SVM (with RBF kernel) classiﬁer in our experiments. Here, κ = 2.0537 for m = 50 and σ = 0.05 for CIFAR-10. We regard adversarial examples as the positive class and natural images as the negative class, and randomly select 80% of samples from each class to train the detector classiﬁer, and use the remaining 20% for test.