Detecting and Diagnosing Adversarial Images with Class-Conditional Capsule Reconstructions
Authors: Yao Qin, Nicholas Frosst, Sara Sabour, Colin Raffel, Garrison Cottrell, Geoffrey Hinton
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose a class-conditional capsule reconstruction based detection method to detect standard white-box/black-box adversarial examples on three datasets. This detection mechanism is attack-agnostic and is successfully extended to standard convolutional neural networks. We test our detection mechanism on the corrupted MNIST dataset and show that it can work as a general out-of-distribution detector. A stronger reconstructive attack is specifically designed to attack our detection mechanism but becomes less successful in fooling the classifier. We perform extensive qualitative studies to explain the superior performance of Caps Nets in detecting adversarial examples compared to CNNs. The results suggest that the features captured by Caps Nets are more aligned with human perception. |
| Researcher Affiliation | Collaboration | Yao Qin UC San Diego yaq007@eng.ucsd.edu Nicholas Frosst Google Brain frosst@google.com Sara Sabour Google Brain sasabour@google.com Colin Raffel Google Brain craffel@google.com Garrison Cottrell UC San Diego gary@eng.ucsd.edu Geoffrey Hinton Google Brain geoffhinton@google.com |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating that its source code is publicly available. |
| Open Datasets | Yes | We run experiments on three datasets: MNIST (Le Cun et al., 1998), Fashion MNIST (Xiao et al., 2017), and SVHN (Netzer et al., 2011). Corrupted MNIST dataset (Mu & Gilmer, 2019). CIFAR-10 (Krizhevsky, 2009). |
| Dataset Splits | Yes | We find the threshold θ for detecting adversarial inputs by measuring the reconstruction error between a validation input image and its reconstruction. If the distance between the input and the reconstruction is above the chosen threshold θ, we classify the data as adversarial. Choosing the detection threshold θ involves a trade-off between false positive and false negative detection rates. ... In our experiments we don t tune this parameter and simply set it as the 95th percentile of validation distances. This means our false positive rate on real validation data is 5%. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models or specific cloud instance types) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies (e.g., library names with version numbers like PyTorch 1.x or TensorFlow 2.x) needed to replicate the experiment. |
| Experiment Setup | Yes | The reconstruction network is simply a fully connected neural network with two Re LU hidden layers with 512 and 1024 units respectively, with a sigmoid output with the same dimensionality as the dataset. In all experiments, all three models (Caps Net, CNN+R, and CNN+CR) have the same number of parameters and were trained with Adam (Kingma & Ba, 2014) for the same number of epochs. For all the ℓ based adversarial examples, the ℓ norm of the perturbations is bound by ϵ, which is set to 0.3, 0.1, 0.1 for MNIST, Fashion MNIST and SVHN dataset respectively following previous work (Madry et al., 2017; Song et al., 2017). In FGSM based attacks, the step size c is 0.05. In BIM-based (Kurakin et al., 2016) and PGD-based (Madry et al., 2017) attacks, the step size c is 0.01 for all the datasets and the number of iterations are 1000, 500 and 200 for MNIST, Fashion MNIST and SVHN dataset respectively. |