How Many Perturbations Break This Model? Evaluating Robustness Beyond Adversarial Accuracy

Authors: Raphael Olivier, Bhiksha Raj

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We run experiments in L2 and L perturbations on the CIFAR10 dataset (Krizhevsky et al., 2009). Additional results on Image Net can be found in Appendix C. We use attack radii ϵ = 0.5 and ϵ = 8/255 respectively. The results of adversarially defended models are reported in Table 1.
Researcher Affiliation Academia Raphael Olivier 1 Bhiksha Raj 1 1Language Technologies Institute, Carnegie Mellon University, Pittsburgh, USA. Correspondence to: Raphael Olivier <rolivier@cs.cmu.edu>.
Pseudocode Yes Algorithm 1 L2 sparsity computation algorithm
Open Source Code Yes Our code is available at https://github.com/ Raphael Olivier/sparsity
Open Datasets Yes We run experiments in L2 and L perturbations on the CIFAR10 dataset (Krizhevsky et al., 2009).
Dataset Splits No We report averaged values of sparsity over the first 1000 vulnerable inputs in the CIFAR10 test set and, for each input, 100 random directions.
Hardware Specification Yes On CIFAR10, computing adversarial sparsity around an input point with 100 directions, 10 search steps, and 20 PGD iterations takes a few seconds for a Res Net-18 model on an Nvidia RTX 2080 Ti.
Software Dependencies No The paper mentions using specific attacks (PGD, APGD-CE, APGD-DLR) and models (Res Net-18), but does not provide specific software environment versions like Python, PyTorch/TensorFlow versions, or library versions.
Experiment Setup Yes We run experiments in L2 and L perturbations on the CIFAR10 dataset (Krizhevsky et al., 2009). Additional results on Image Net can be found in Appendix C. We use attack radii ϵ = 0.5 and ϵ = 8/255 respectively.