reproducibilityindex.ai

Revisiting adapters with adversarial training

Authors: Sylvestre-Alvise Rebuffi, Francesco Croce, Sven Gowal

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	First, we improve upon the top-1 accuracy of a non-adversarially trained VIT-B16 model by +1.12% on IMAGENET (reaching 83.76% top-1 accuracy). Second, and more importantly, we show that training with adapters enables model soups through linear combinations of the clean and adversarial tokens.
Researcher Affiliation	Industry	Sylvestre-Alvise Rebuffi, Francesco Croce & Sven Gowal Deep Mind, London {sylvestre,sgowal}@deepmind.com
Pseudocode	No	The paper does not contain any sections explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present structured steps for a method in a code-like format.
Open Source Code	No	We note that our implementation of Rand Augment is based on the version found in the timm library (Wightman, 2019).
Open Datasets	Yes	We focus our experimental evaluation on the IMAGENET dataset (Russakovsky et al., 2015)... Moreover, we test the robustness against distribution shifts via several IMAGENET variants: IMAGENET-C (Hendrycks & Dietterich, 2018), IMAGENET-A (Hendrycks et al., 2019), IMAGENET-R (Hendrycks et al., 2020), IMAGENET-SKETCH (Wang et al., 2019), and Conflict Stimuli (Geirhos et al., 2018).
Dataset Splits	Yes	We report clean and adversarial accuracy on the whole validation set.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	Yes	We note that our implementation of Rand Augment is based on the version found in the timm library (Wightman, 2019).
Experiment Setup	Yes	The model is optimized for 300 epochs using the Adam W optimizer (Loshchilov & Hutter, 2017) with momenta β1 = 0.9, β2 = 0.95, with a weight decay of 0.3 and a cosine learning rate decay with base learning rate 1e-4 and linear ramp-up of 20 epochs. The batch size is set to 4096 and we scale the learning rates using the linear scaling rule of Goyal et al. (2017). We optimize the standard cross-entropy loss and we use a label smoothing of 0.1.