Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality
Authors: Xingjun Ma, Bo Li, Yisen Wang, Sarah M. Erfani, Sudanthi Wijewickrema, Grant Schoenebeck, Dawn Song, Michael E. Houle, James Bailey
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We first provide explanations about how adversarial perturbation can affect the LID characteristic of adversarial regions, and then show empirically that LID characteristics can facilitate the distinction of adversarial examples generated using state-of-the-art attacks. As a proof-of-concept, we show that a potential application of LID is to distinguish adversarial examples, and the preliminary results show that it can outperform several state-of-the-art detection measures by large margins for five attack strategies considered in this paper across three benchmark datasets. |
| Researcher Affiliation | Academia | 1The University of Melbourne, Parkville, Australia 2University of California, Berkeley, USA 3Tsinghua University, Beijing, China 4University of Michigan, Ann Arbor, USA 5National Institute of Informatics, Tokyo, Japan |
| Pseudocode | Yes | Algorithm 1 Training phase for LID-based adversarial classifier |
| Open Source Code | Yes | Our code is available for download at https: //github.com/xingjunm/lid_adversarial_subspace_detection. |
| Open Datasets | Yes | For each of the 5 forms of attack, the LID detector is compared with the state-of-the-art detection measures KD and BU as discussed in Section 2, with respect to three benchmark image datasets: MNIST (Le Cun et al., 1990), CIFAR-10 (Krizhevsky & Hinton, 2009) and SVHN (Netzer et al., 2011). |
| Dataset Splits | No | The paper describes a division into train (80%) and test (20%) sets. It also mentions parameter tuning using nested cross-validation, but it does not specify a distinct validation dataset split for training the models. |
| Hardware Specification | No | The paper describes the DNN architectures used (e.g., "5-layer Conv Net", "12-layer Conv Net") but does not provide any specific details about the hardware used to run the experiments (e.g., GPU models, CPU types). |
| Software Dependencies | No | The paper mentions using "the cleverhans library" and "the author's implementation" for certain attack strategies but does not provide specific version numbers for these software components or any other libraries. |
| Experiment Setup | Yes | Parameter Tuning: We tuned the bandwidth (σ) parameter for KD, and the number of nearest neighbors (k) for LID, using nested cross validation within the training set (train). Using the AUC values of detection performance, the bandwidth was tuned using a grid search over the range [0, 10) in log-space, and neighborhood size was tuned using a grid search over the range [10, 100) with respect to a minibatch of size 100. For a given dataset, the parameter setting selected was the one with highest AUC averaged across all attacks. The optimal bandwidths chosen for MNIST, CIFAR10 and SVHN were 3.79, 0.26, and 1.0, respectively, while the value of k for LID estimation was set to 20 for MNIST and CIFAR-10, and 30 for SVHN. For BU, we chose the number of prediction runs to be T = 50 in all experiments. |