Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies
Authors: Brian R. Bartoldson, James Diffenderfer, Konstantinos Parasyris, Bhavya Kailkhura
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper revisits the simple, long-studied, yet still unsolved problem of making image classifiers robust to imperceptible perturbations. Taking CIFAR10 as an example, SOTA clean accuracy is about 100%, but SOTA robustness to ℓ -norm bounded perturbations barely exceeds 70%. To understand this gap, we analyze how model size, dataset size, and synthetic data quality affect robustness by developing the first scaling laws for adversarial training. Our scaling laws reveal inefficiencies in prior art and provide actionable feedback to advance the field. For instance, we discovered that SOTA methods diverge notably from compute-optimal setups, using excess compute for their level of robustness. Leveraging a compute-efficient setup, we surpass the prior SOTA with 20% (70%) fewer training (inference) FLOPs. We trained various compute-efficient models, with our best achieving 74% Auto Attack accuracy (+3% gain). |
| Researcher Affiliation | Academia | 1Lawrence Livermore National Laboratory. Correspondence to: Brian and Bhavya <bartoldson@llnl.gov, kailkhura1@llnl.gov>. |
| Pseudocode | No | No pseudocode or clearly labeled algorithm blocks were found. |
| Open Source Code | Yes | See code or take online quiz via https://github.com/ bbartoldson/Adversarial-Robustness-Limits. |
| Open Datasets | Yes | Taking CIFAR10 as an example, SOTA clean accuracy is about 100%, but SOTA robustness to ℓ -norm bounded perturbations barely exceeds 70%. |
| Dataset Splits | No | The paper does not explicitly provide specific training/validation/test split percentages or sample counts for a validation set. |
| Hardware Specification | No | The paper does not specify the exact hardware (e.g., GPU model, CPU type) used for running its experiments. |
| Software Dependencies | No | The paper mentions PyTorch but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | Specifically, we use: 10 PGD steps to adversarially attack images at training time with step size α = 2/255; label smoothing (Szegedy et al., 2016) 0.1; synthetic CIFAR10 data from generative models in our training datasets; the TRADES loss (Zhang et al., 2019) with β = 5; weight averaging (Izmailov et al., 2018) with decay rate τ = 0.995; SGD optimization with Nesterov momentum (Nesterov, 1983) set to 0.9 and weight decay 5 10 4; and a cyclic learning rate schedule with cosine annealing (Smith & Topin, 2017). The learning rate and batch size we use varies based on dataset size according to optimal settings found by hyperparameter search: datasets with 10M or fewer samples use batch size 1024, otherwise batch size is 2048; the learning rate is 0.3 for datasets with 10M or fewer samples, 0.2 for larger datasets with up to 200M samples, and 0.1 for datasets with 300M samples (see Appendix D). |