Confidence-Calibrated Adversarial Training: Generalizing to Unseen Attacks

Authors: David Stutz, Matthias Hein, Bernt Schiele

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate CCAT in comparison with AT (Madry et al., 2018) and related work (Maini et al., 2020; Zhang et al., 2019) on MNIST (Le Cun et al., 1998), SVHN (Netzer et al., 2011) and Cifar10 (Krizhevsky, 2009) as well as MNIST-C (Mu & Gilmer, 2019) and Cifar10-C (Hendrycks & Dietterich, 2019) with corrupted examples (e.g., blur, noise, compression, transforms etc.). We report confidencethresholded test error (Err; lower is better) and confidencethresholded robust test error (RErr; lower is better) for a confidence-threshold τ corresponding to 99% true positive rate (TPR); we omit τ for brevity.
Researcher Affiliation Academia 1Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbr ucken 2University of T ubingen, T ubingen. Correspondence to: David Stutz <david.stutz@mpi-inf.mpg.de>.
Pseudocode Yes Algorithm 1 Confidence-Calibrated Adversarial Training (CCAT). The only changes compared to standard adversarial training are the attack (line 4) and the probability distribution over the classes (lines 6 and 7), which becomes more uniform as distance δ increases. During testing, low-confidence (adversarial) examples are rejected.
Open Source Code Yes We make our code (training and evaluation) and pre-trained models publicly available at davidstutz.de/ccat.
Open Datasets Yes We evaluate CCAT in comparison with AT (Madry et al., 2018) and related work (Maini et al., 2020; Zhang et al., 2019) on MNIST (Le Cun et al., 1998), SVHN (Netzer et al., 2011) and Cifar10 (Krizhevsky, 2009) as well as MNIST-C (Mu & Gilmer, 2019) and Cifar10-C (Hendrycks & Dietterich, 2019) with corrupted examples (e.g., blur, noise, compression, transforms etc.).
Dataset Splits Yes Err is computed on 9000 test examples. RErr is computed on 1000 test examples. The confidence threshold τ depends only on correctly classified clean examples and is fixed at 99%TPR on the held-out last 1000 test examples.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running experiments.
Software Dependencies No The paper mentions 'implemented in Py Torch (Paszke et al., 2017)' but does not specify a version number for PyTorch or any other relevant software dependencies.
Experiment Setup Yes Training: We train 50%/50% AT (AT-50%) and CCAT as well as 100% AT (AT-100%) with L attacks using T = 40 iterations for PGD-CE and PGD-Conf, respectively, and ϵ = 0.3 (MNIST) or ϵ = 0.03 (SVHN/Cifar10). We use Res Net-20 (He et al., 2016), implemented in Py Torch (Paszke et al., 2017), trained using stochastic gradient descent. For CCAT, we use ρ = 10.