Revisiting adapters with adversarial training

Authors: Sylvestre-Alvise Rebuffi, Francesco Croce, Sven Gowal

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental First, we improve upon the top-1 accuracy of a non-adversarially trained VIT-B16 model by +1.12% on IMAGENET (reaching 83.76% top-1 accuracy). Second, and more importantly, we show that training with adapters enables model soups through linear combinations of the clean and adversarial tokens.
Researcher Affiliation Industry Sylvestre-Alvise Rebuffi, Francesco Croce & Sven Gowal Deep Mind, London {sylvestre,sgowal}@deepmind.com
Pseudocode No The paper does not contain any sections explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present structured steps for a method in a code-like format.
Open Source Code No We note that our implementation of Rand Augment is based on the version found in the timm library (Wightman, 2019).
Open Datasets Yes We focus our experimental evaluation on the IMAGENET dataset (Russakovsky et al., 2015)... Moreover, we test the robustness against distribution shifts via several IMAGENET variants: IMAGENET-C (Hendrycks & Dietterich, 2018), IMAGENET-A (Hendrycks et al., 2019), IMAGENET-R (Hendrycks et al., 2020), IMAGENET-SKETCH (Wang et al., 2019), and Conflict Stimuli (Geirhos et al., 2018).
Dataset Splits Yes We report clean and adversarial accuracy on the whole validation set.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies Yes We note that our implementation of Rand Augment is based on the version found in the timm library (Wightman, 2019).
Experiment Setup Yes The model is optimized for 300 epochs using the Adam W optimizer (Loshchilov & Hutter, 2017) with momenta β1 = 0.9, β2 = 0.95, with a weight decay of 0.3 and a cosine learning rate decay with base learning rate 1e-4 and linear ramp-up of 20 epochs. The batch size is set to 4096 and we scale the learning rates using the linear scaling rule of Goyal et al. (2017). We optimize the standard cross-entropy loss and we use a label smoothing of 0.1.