reproducibilityindex.ai

Detecting and Diagnosing Adversarial Images with Class-Conditional Capsule Reconstructions

Authors: Yao Qin, Nicholas Frosst, Sara Sabour, Colin Raffel, Garrison Cottrell, Geoffrey Hinton

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose a class-conditional capsule reconstruction based detection method to detect standard white-box/black-box adversarial examples on three datasets. This detection mechanism is attack-agnostic and is successfully extended to standard convolutional neural networks. We test our detection mechanism on the corrupted MNIST dataset and show that it can work as a general out-of-distribution detector. A stronger reconstructive attack is specifically designed to attack our detection mechanism but becomes less successful in fooling the classifier. We perform extensive qualitative studies to explain the superior performance of Caps Nets in detecting adversarial examples compared to CNNs. The results suggest that the features captured by Caps Nets are more aligned with human perception.
Researcher Affiliation	Collaboration	Yao Qin UC San Diego yaq007@eng.ucsd.edu Nicholas Frosst Google Brain frosst@google.com Sara Sabour Google Brain sasabour@google.com Colin Raffel Google Brain craffel@google.com Garrison Cottrell UC San Diego gary@eng.ucsd.edu Geoffrey Hinton Google Brain geoffhinton@google.com
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement or link indicating that its source code is publicly available.
Open Datasets	Yes	We run experiments on three datasets: MNIST (Le Cun et al., 1998), Fashion MNIST (Xiao et al., 2017), and SVHN (Netzer et al., 2011). Corrupted MNIST dataset (Mu & Gilmer, 2019). CIFAR-10 (Krizhevsky, 2009).
Dataset Splits	Yes	We find the threshold θ for detecting adversarial inputs by measuring the reconstruction error between a validation input image and its reconstruction. If the distance between the input and the reconstruction is above the chosen threshold θ, we classify the data as adversarial. Choosing the detection threshold θ involves a trade-off between false positive and false negative detection rates. ... In our experiments we don t tune this parameter and simply set it as the 95th percentile of validation distances. This means our false positive rate on real validation data is 5%.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models or specific cloud instance types) used for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependencies (e.g., library names with version numbers like PyTorch 1.x or TensorFlow 2.x) needed to replicate the experiment.
Experiment Setup	Yes	The reconstruction network is simply a fully connected neural network with two Re LU hidden layers with 512 and 1024 units respectively, with a sigmoid output with the same dimensionality as the dataset. In all experiments, all three models (Caps Net, CNN+R, and CNN+CR) have the same number of parameters and were trained with Adam (Kingma & Ba, 2014) for the same number of epochs. For all the ℓ based adversarial examples, the ℓ norm of the perturbations is bound by ϵ, which is set to 0.3, 0.1, 0.1 for MNIST, Fashion MNIST and SVHN dataset respectively following previous work (Madry et al., 2017; Song et al., 2017). In FGSM based attacks, the step size c is 0.05. In BIM-based (Kurakin et al., 2016) and PGD-based (Madry et al., 2017) attacks, the step size c is 0.01 for all the datasets and the number of iterations are 1000, 500 and 200 for MNIST, Fashion MNIST and SVHN dataset respectively.