Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Confidence-Calibrated Adversarial Training: Generalizing to Unseen Attacks

Authors: David Stutz, Matthias Hein, Bernt Schiele

ICML 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate CCAT in comparison with AT (Madry et al., 2018) and related work (Maini et al., 2020; Zhang et al., 2019) on MNIST (Le Cun et al., 1998), SVHN (Netzer et al., 2011) and Cifar10 (Krizhevsky, 2009) as well as MNIST-C (Mu & Gilmer, 2019) and Cifar10-C (Hendrycks & Dietterich, 2019) with corrupted examples (e.g., blur, noise, compression, transforms etc.). We report confidencethresholded test error (Err; lower is better) and confidencethresholded robust test error (RErr; lower is better) for a confidence-threshold τ corresponding to 99% true positive rate (TPR); we omit τ for brevity.
Researcher Affiliation Academia 1Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbr ucken 2University of T ubingen, T ubingen. Correspondence to: David Stutz <EMAIL>.
Pseudocode Yes Algorithm 1 Confidence-Calibrated Adversarial Training (CCAT). The only changes compared to standard adversarial training are the attack (line 4) and the probability distribution over the classes (lines 6 and 7), which becomes more uniform as distance δ increases. During testing, low-confidence (adversarial) examples are rejected.
Open Source Code Yes We make our code (training and evaluation) and pre-trained models publicly available at davidstutz.de/ccat.
Open Datasets Yes We evaluate CCAT in comparison with AT (Madry et al., 2018) and related work (Maini et al., 2020; Zhang et al., 2019) on MNIST (Le Cun et al., 1998), SVHN (Netzer et al., 2011) and Cifar10 (Krizhevsky, 2009) as well as MNIST-C (Mu & Gilmer, 2019) and Cifar10-C (Hendrycks & Dietterich, 2019) with corrupted examples (e.g., blur, noise, compression, transforms etc.).
Dataset Splits Yes Err is computed on 9000 test examples. RErr is computed on 1000 test examples. The confidence threshold τ depends only on correctly classified clean examples and is fixed at 99%TPR on the held-out last 1000 test examples.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running experiments.
Software Dependencies No The paper mentions 'implemented in Py Torch (Paszke et al., 2017)' but does not specify a version number for PyTorch or any other relevant software dependencies.
Experiment Setup Yes Training: We train 50%/50% AT (AT-50%) and CCAT as well as 100% AT (AT-100%) with L attacks using T = 40 iterations for PGD-CE and PGD-Conf, respectively, and ϵ = 0.3 (MNIST) or ϵ = 0.03 (SVHN/Cifar10). We use Res Net-20 (He et al., 2016), implemented in Py Torch (Paszke et al., 2017), trained using stochastic gradient descent. For CCAT, we use ρ = 10.