Explicit Tradeoffs between Adversarial and Natural Distributional Robustness
Authors: Mazda Moayeri, Kiarash Banihashem, Soheil Feizi
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We first consider a simple linear regression setting on Gaussian data with disjoint sets of core and spurious features. In this setting, through theoretical and empirical analysis, we show that (i) adversarial training with 1 and 2 norms increases the model reliance on spurious features; (ii) For 1 adversarial training, spurious reliance only occurs when the scale of the spurious features is larger than that of the core features; (iii) adversarial training can have an unintended consequence in reducing distributional robustness, specifically when spurious correlations are changed in the new test domain. Next, we present extensive empirical evidence, using a test suite of twenty adversarially trained models evaluated on five benchmark datasets (Object Net, RIVAL10, Salient Image Net-1M, Image Net-9, Waterbirds), that adversarially trained classifiers rely on backgrounds more than their standardly trained counterparts, validating our theoretical results. |
| Researcher Affiliation | Academia | Mazda Moayeri mmoayeri@umd.edu Kiarash Banihashem kiarash@umd.edu Soheil Feizi sfeizi@cs.umd.edu Department of Computer Science University of Maryland |
| Pseudocode | No | The paper contains mathematical derivations and problem formulations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that source code for the methodology is openly available. |
| Open Datasets | Yes | We evaluate models on two backbones (Res Net18, Res Net50) adversarially trained on Image Net [12] using two norms ( 2, 1) under five attack budgets (denoted ) per norm, resulting in a 2 2 5 = 20 model test suite, as well as standardly trained baselines. We appeal to the Image Net-C [22] and Object Net [5] OOD benchmarks. We now directly quantify sensitivity to core features via RIVAL10 and Salient Image Net-1M datasets [42, 61]. Now, we take a closer look at the reliance of adversarially trained models on the contextual spurious feature of backgrounds via the synthetic datasets Image Net-9 [69] and Waterbirds [50]. We train Res Net18s on CIFAR10 [33]... |
| Dataset Splits | Yes | We train only a final linear layer atop the frozen feature extractors (so that models remain adversarially robust) for each of our models on the Waterbirds training set for ten epochs, saving the model with highest validation accuracy. The test set is evenly split between these groups. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or cloud instance specifications. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | We evaluate models on two backbones (Res Net18, Res Net50) adversarially trained on Image Net [12] using two norms ( 2, 1) under five attack budgets (denoted ) per norm, resulting in a 2 2 5 = 20 model test suite, as well as standardly trained baselines. We train only a final linear layer atop the frozen feature extractors (so that models remain adversarially robust) for each of our models on the Waterbirds training set for ten epochs, saving the model with highest validation accuracy. |