Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples

Authors: Guanhong Tao, Shiqing Ma, Yingqi Liu, Xiangyu Zhang

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Results show that our technique can achieve 94% detection accuracy for 7 different kinds of attacks with 9.91% false positives on benign inputs. In contrast, a state-of-the-art feature squeezing technique can only achieve 55% accuracy with 23.3% false positives. We use one of the most widely used FRSes, VGG-Face [19] to demonstrate effectiveness of Am I. Three datasets, VGG Face dataset (VF) [18], Labeled Faces in the Wild (LFW) [33] and Celeb Faces Attributes dataset (Celeb A) [34] are employed.
Researcher Affiliation Academia Guanhong Tao , Shiqing Ma , Yingqi Liu, Xiangyu Zhang Department of Computer Science, Purdue University {taog, ma229, liu1751, xyzhang}@cs.purdue.edu
Pseudocode No The paper describes its method in prose and mathematical equations but does not include any explicit pseudocode blocks or algorithms.
Open Source Code Yes Am I is available at Git Hub [25]. [25] Am IAttribute. Am IAttribute/Am I. https://github.com/AmIAttribute/AmI, 2018.
Open Datasets Yes Three datasets, VGG Face dataset (VF) [18], Labeled Faces in the Wild (LFW) [33] and Celeb Faces Attributes dataset (Celeb A) [34] are employed. We use a small subset of the VF dataset (10 images) to extract attribute witnesses... We use 2000 training images from the VF set (1000 with the attribute and 1000 without the attribute) to train the model.
Dataset Splits Yes ϵ and β are set to 1.15 and 60, respectively in this paper. They are chosen through a tuning set of 100 benign images, which has no overlap with the test set.
Hardware Specification No The paper does not specify any details about the hardware (e.g., GPU, CPU models, memory) used for running the experiments.
Software Dependencies Yes We use the Clever Hans library [36] to generate untargeted attacks FGSM and BIM. [36] Nicolas Papernot, Nicholas Carlini, Ian Goodfellow, Reuben Feinman, Fartash Faghri, Alexander Matyasko, Karen Hambardzumyan, Yi-Lin Juang, Alexey Kurakin, Ryan Sheatsley, et al. Clever Hans v2.0.0: An Adversarial Machine Learning Library. ar Xiv preprint ar Xiv:1610.00768, 2016.
Experiment Setup Yes α defines the magnitude of weakening, which is set to 100 in this paper. ϵ and β are set to 1.15 and 60, respectively in this paper.