Confidence-Calibrated Adversarial Training: Generalizing to Unseen Attacks
Authors: David Stutz, Matthias Hein, Bernt Schiele
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate CCAT in comparison with AT (Madry et al., 2018) and related work (Maini et al., 2020; Zhang et al., 2019) on MNIST (Le Cun et al., 1998), SVHN (Netzer et al., 2011) and Cifar10 (Krizhevsky, 2009) as well as MNIST-C (Mu & Gilmer, 2019) and Cifar10-C (Hendrycks & Dietterich, 2019) with corrupted examples (e.g., blur, noise, compression, transforms etc.). We report confidencethresholded test error (Err; lower is better) and confidencethresholded robust test error (RErr; lower is better) for a confidence-threshold τ corresponding to 99% true positive rate (TPR); we omit τ for brevity. |
| Researcher Affiliation | Academia | 1Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbr ucken 2University of T ubingen, T ubingen. Correspondence to: David Stutz <david.stutz@mpi-inf.mpg.de>. |
| Pseudocode | Yes | Algorithm 1 Confidence-Calibrated Adversarial Training (CCAT). The only changes compared to standard adversarial training are the attack (line 4) and the probability distribution over the classes (lines 6 and 7), which becomes more uniform as distance δ increases. During testing, low-confidence (adversarial) examples are rejected. |
| Open Source Code | Yes | We make our code (training and evaluation) and pre-trained models publicly available at davidstutz.de/ccat. |
| Open Datasets | Yes | We evaluate CCAT in comparison with AT (Madry et al., 2018) and related work (Maini et al., 2020; Zhang et al., 2019) on MNIST (Le Cun et al., 1998), SVHN (Netzer et al., 2011) and Cifar10 (Krizhevsky, 2009) as well as MNIST-C (Mu & Gilmer, 2019) and Cifar10-C (Hendrycks & Dietterich, 2019) with corrupted examples (e.g., blur, noise, compression, transforms etc.). |
| Dataset Splits | Yes | Err is computed on 9000 test examples. RErr is computed on 1000 test examples. The confidence threshold τ depends only on correctly classified clean examples and is fixed at 99%TPR on the held-out last 1000 test examples. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running experiments. |
| Software Dependencies | No | The paper mentions 'implemented in Py Torch (Paszke et al., 2017)' but does not specify a version number for PyTorch or any other relevant software dependencies. |
| Experiment Setup | Yes | Training: We train 50%/50% AT (AT-50%) and CCAT as well as 100% AT (AT-100%) with L attacks using T = 40 iterations for PGD-CE and PGD-Conf, respectively, and ϵ = 0.3 (MNIST) or ϵ = 0.03 (SVHN/Cifar10). We use Res Net-20 (He et al., 2016), implemented in Py Torch (Paszke et al., 2017), trained using stochastic gradient descent. For CCAT, we use ρ = 10. |