Revisiting adapters with adversarial training
Authors: Sylvestre-Alvise Rebuffi, Francesco Croce, Sven Gowal
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | First, we improve upon the top-1 accuracy of a non-adversarially trained VIT-B16 model by +1.12% on IMAGENET (reaching 83.76% top-1 accuracy). Second, and more importantly, we show that training with adapters enables model soups through linear combinations of the clean and adversarial tokens. |
| Researcher Affiliation | Industry | Sylvestre-Alvise Rebuffi, Francesco Croce & Sven Gowal Deep Mind, London {sylvestre,sgowal}@deepmind.com |
| Pseudocode | No | The paper does not contain any sections explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present structured steps for a method in a code-like format. |
| Open Source Code | No | We note that our implementation of Rand Augment is based on the version found in the timm library (Wightman, 2019). |
| Open Datasets | Yes | We focus our experimental evaluation on the IMAGENET dataset (Russakovsky et al., 2015)... Moreover, we test the robustness against distribution shifts via several IMAGENET variants: IMAGENET-C (Hendrycks & Dietterich, 2018), IMAGENET-A (Hendrycks et al., 2019), IMAGENET-R (Hendrycks et al., 2020), IMAGENET-SKETCH (Wang et al., 2019), and Conflict Stimuli (Geirhos et al., 2018). |
| Dataset Splits | Yes | We report clean and adversarial accuracy on the whole validation set. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | Yes | We note that our implementation of Rand Augment is based on the version found in the timm library (Wightman, 2019). |
| Experiment Setup | Yes | The model is optimized for 300 epochs using the Adam W optimizer (Loshchilov & Hutter, 2017) with momenta β1 = 0.9, β2 = 0.95, with a weight decay of 0.3 and a cosine learning rate decay with base learning rate 1e-4 and linear ramp-up of 20 epochs. The batch size is set to 4096 and we scale the learning rates using the linear scaling rule of Goyal et al. (2017). We optimize the standard cross-entropy loss and we use a label smoothing of 0.1. |