OODRobustBench: a Benchmark and Large-Scale Analysis of Adversarial Robustness under Distribution Shift

Authors: Lin Li, Yifei Wang, Chawin Sitawarin, Michael W. Spratling

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To address this issue we propose a benchmark named OODRobust Bench to comprehensively assess OOD adversarial robustness using 23 dataset-wise shifts (i.e. naturalistic shifts in input distribution) and 6 threat-wise shifts (i.e., unforeseen adversarial threat models). OODRobust Bench is used to assess 706 robust models using 60.7K adversarial evaluations. This large-scale analysis shows that: 1) adversarial robustness suffers from a severe OOD generalization issue; 2) ID robustness correlates strongly with OOD robustness in a positive linear way.
Researcher Affiliation Academia 1Department of Informatics, King s College London, UK 2MIT CSAIL, USA 3UC Berkeley, USA 4University of Luxembourg, Luxembourg.
Pseudocode No No pseudocode or algorithm blocks are present in the paper.
Open Source Code Yes Code and models are available at: https://github.com/ OODRobust Bench/OODRobust Bench.
Open Datasets Yes OODRobust Bench includes multiple types of dataset shifts from two sources: natural and corruption. For natural shifts, we adopt four different variant datasets per source dataset: CIFAR10.1 (Recht et al., 2018), CIFAR10.2 (Lu et al., 2020), CINIC (Darlow et al., 2018), and CIFAR10R (Hendrycks et al., 2021a) for CIFAR10, and Image Netv2 (Recht et al., 2019), Image Net-A (Hendrycks et al., 2021b), Image Net-R (Hendrycks et al., 2021a), and Object Net (Barbu et al., 2019) for Image Net.
Dataset Splits No No explicit mention of validation dataset splits or methodology for validation is provided, only general references to training and testing.
Hardware Specification No The authors acknowledge the use of the research computing facility at King s College London, King s Computational Research, Engineering and Technology Environment (CREATE).
Software Dependencies No No specific software dependencies with version numbers are mentioned in the paper.
Experiment Setup Yes We train all models under both ℓ and ℓ2 threat models with the following steps: 1. We use PGD adversarial training to train eight models with batch size {128, 512}, a learning rate {0.1, 0.05}, and weight decay {10 4, 10 5}. We also save the overall best hyperparameter choice. For the ℓ2 threat model, we fix the learning rate to 0.1 since we observe that with ℓ , 0.1 is strictly better than 0.05. 2. Using the best hyperparameter choice, we train one model with PGD-SCORE, three with TRADES, and three with TRADES-SCORE. For TRADES and TRADES-SCORE, we take their β parameter from 0.1, 0.3, 1.0.