Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples
Authors: Guanhong Tao, Shiqing Ma, Yingqi Liu, Xiangyu Zhang
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Results show that our technique can achieve 94% detection accuracy for 7 different kinds of attacks with 9.91% false positives on benign inputs. In contrast, a state-of-the-art feature squeezing technique can only achieve 55% accuracy with 23.3% false positives. We use one of the most widely used FRSes, VGG-Face [19] to demonstrate effectiveness of Am I. Three datasets, VGG Face dataset (VF) [18], Labeled Faces in the Wild (LFW) [33] and Celeb Faces Attributes dataset (Celeb A) [34] are employed. |
| Researcher Affiliation | Academia | Guanhong Tao , Shiqing Ma , Yingqi Liu, Xiangyu Zhang Department of Computer Science, Purdue University {taog, ma229, liu1751, xyzhang}@cs.purdue.edu |
| Pseudocode | No | The paper describes its method in prose and mathematical equations but does not include any explicit pseudocode blocks or algorithms. |
| Open Source Code | Yes | Am I is available at Git Hub [25]. [25] Am IAttribute. Am IAttribute/Am I. https://github.com/AmIAttribute/AmI, 2018. |
| Open Datasets | Yes | Three datasets, VGG Face dataset (VF) [18], Labeled Faces in the Wild (LFW) [33] and Celeb Faces Attributes dataset (Celeb A) [34] are employed. We use a small subset of the VF dataset (10 images) to extract attribute witnesses... We use 2000 training images from the VF set (1000 with the attribute and 1000 without the attribute) to train the model. |
| Dataset Splits | Yes | ϵ and β are set to 1.15 and 60, respectively in this paper. They are chosen through a tuning set of 100 benign images, which has no overlap with the test set. |
| Hardware Specification | No | The paper does not specify any details about the hardware (e.g., GPU, CPU models, memory) used for running the experiments. |
| Software Dependencies | Yes | We use the Clever Hans library [36] to generate untargeted attacks FGSM and BIM. [36] Nicolas Papernot, Nicholas Carlini, Ian Goodfellow, Reuben Feinman, Fartash Faghri, Alexander Matyasko, Karen Hambardzumyan, Yi-Lin Juang, Alexey Kurakin, Ryan Sheatsley, et al. Clever Hans v2.0.0: An Adversarial Machine Learning Library. ar Xiv preprint ar Xiv:1610.00768, 2016. |
| Experiment Setup | Yes | α defines the magnitude of weakening, which is set to 100 in this paper. ϵ and β are set to 1.15 and 60, respectively in this paper. |