reproducibilityindex.ai

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training

Authors: Xi Wu, Uyeong Jang, Jiefeng Chen, Lingjiao Chen, Somesh Jha

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform a detailed empirical study over CIFAR10 for ℓ attacks. We reuse the robust Res Net model trained by Madry et al. as base model, and use HCNNξ. We modify state-of-the-art ℓ attacks, such as the CW attack (Carlini & Wagner, 2017a), and the PGD attack (Madry et al., 2017), to exploit conﬁdence information in order to break our method by generating high-conﬁdence attacks. We ﬁrst empirically validate that Madry et al. s model is better, in view of our probabilistic separation property, than models trained without a robustness objective. We then evaluate using conﬁdence to reject adversarial examples, and ﬁnally end-to-end defense results. Our results are both encouraging and discouraging: for small radius, we ﬁnd that conﬁdence is indeed a good discriminator to distinguish right and wrong predictions, and it does improve adversarial robustness.
Researcher Affiliation	Collaboration	Xi Wu * 1 Uyeong Jang * 2 Jiefeng Chen 2 Lingjiao Chen 2 Somesh Jha 2 1Google 2University of Wisconsin Madison. Correspondence to: Xi Wu <xiwu@cs.wisc.edu>.
Pseudocode	Yes	Algorithm 1 Solving HCNNξ by solving for each label. Input: x a feature vector, ξ > 0 a real parameter, λ 0 a real parameter, a base model F, any gradient-based optimization algorithm O to solve the constrained optimization problem deﬁned in (5). 1: function Oracle HCNN(x, ξ, F) 2: for l C do 3: z(l) O(x, F, l) 4: return z(l ) where l = arg max l C F(z(l))l λ\| z(l) x \| .
Open Source Code	No	The paper does not provide concrete access to source code, such as a repository link or an explicit statement about code release.
Open Datasets	Yes	Overall Setup. We study the above questions using ℓ attacks over CIFAR10 (Krizhevsky, 2009).
Dataset Splits	No	The paper mentions using a 'test set' for evaluation but does not specify the train/validation/test dataset splits, percentages, or methodology needed to reproduce the data partitioning.
Hardware Specification	No	The paper does not explicitly describe the hardware used for its experiments, such as specific GPU or CPU models.
Software Dependencies	No	The paper mentions using attacks like CW and PGD but does not provide specific ancillary software details with version numbers, such as library or framework versions.
Experiment Setup	Yes	We use a strengthened version of the PGD attack used in (Madry et al., 2017) (with ℓ radius η, 10 random starts, and 100 iterations) to ﬁrst generate, for each wrong label, an adversarial example whose model conﬁdence is as large as possible. Our implementation of MCNξ solves (5) using the PGD attack with a different setting (ℓ radius ξ, no random start, and 500 iterations).