reproducibilityindex.ai

On Improving Adversarial Transferability of Vision Transformers

Authors: Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Fahad Khan, Fatih Porikli

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct thorough experimentation on a range of standard attack methods to establish the performance boosts obtained through our proposed transferability approach. We create ℓ adversarial attacks with ϵ 16 and observe their transferability by using the following protocols: Source (white-box) models: We mainly study three vision transformers from Dei T (Touvron et al., 2020) family due to their data efﬁciency. Speciﬁcally, the source models are Deit-T, Dei T-S, and Dei T-B (with 5, 22, and 86 million parameters, respectively). They are trained without CNN distillation. Adversarial examples are created on these models using an existing white-box attack (e.g., FGSM (Goodfellow et al., 2014), PGD (Madry et al., 2018) and MIM (Dong et al., 2018)) and then transferred to the black-box target models. Target (black-box) models: We test the black-box transferability across several vision tasks including classiﬁcation, detection and segmentation. We consider convolutional networks including Bi T-Res Net50 (Bi T50) (Beyer et al., 2021), Res Net152 (Res152) (He et al., 2016), Wide-Res Net-50-2 (WRN) (Zagoruyko & Komodakis, 2016), Dense Net201 (DN201) (Huang et al., 2017) and other Vi T models including Token-to-Token transformer (T2T) (Yuan et al., 2021), Transformer in Transformer (Tn T) (Mao et al., 2021), DINO (Caron et al., 2021), and Detection Transformer (DETR) (Carion et al., 2020) as the black-box target models. Datasets: We use Image Net training set to ﬁne tune our proposed token reﬁnement modules. For evaluating robustness, we selected 5k samples from Image Net validation set such that 5 random samples from each class that are correctly classiﬁed by Res Net50 and Vi T small (Vi T-S) (Dosovitskiy et al., 2020) are present. In addition, we conduct experiments on COCO (Lin et al., 2014) (5k images) and PASCAL-VOC12 (Everingham et al., 2012) (around 1.2k images) validation set.
Researcher Affiliation	Collaboration	Australian National University, Stony Brook University, Linköping University Mohamed bin Zayed University of AI, Qualcomm USA
Pseudocode	Yes	Algorithm 1 Cross-Task Attack; Algorithm 2 Attack for different input sizes
Open Source Code	Yes	Code: https://t.ly/h Bb W. Further, we will publicly release all the models with reﬁned tokens.
Open Datasets	Yes	Datasets: We use Image Net training set to ﬁne tune our proposed token reﬁnement modules. For evaluating robustness, we selected 5k samples from Image Net validation set such that 5 random samples from each class that are correctly classiﬁed by Res Net50 and Vi T small (Vi T-S) (Dosovitskiy et al., 2020) are present. In addition, we conduct experiments on COCO (Lin et al., 2014) (5k images) and PASCAL-VOC12 (Everingham et al., 2012) (around 1.2k images) validation set. In appendix D, we extended our approach to other dataset including CIFAR10 (Krizhevsky et al., 2009) and Flowers (Nilsback & Zisserman, 2008) datasets.
Dataset Splits	Yes	Datasets: We use Image Net training set to ﬁne tune our proposed token reﬁnement modules. For evaluating robustness, we selected 5k samples from Image Net validation set such that 5 random samples from each class that are correctly classiﬁed by Res Net50 and Vi T small (Vi T-S) (Dosovitskiy et al., 2020) are present. Additionally, we re-run our best performing attacks (MIM and DIM) on the whole Image Net val. set (50k samples) to validate the merits of our approach ( refer Table 13).
Hardware Specification	Yes	Training ﬁnishes in less than one day on a single GPU-V100 even for a large Vi T model such as Dei T-B. Inference speed is computed using Nvidia Quadro RTX 6000 with Pytorch library.
Software Dependencies	No	The paper mentions 'Pytorch library' and cites 'sklearn (Pedregosa et al., 2011)' but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	We used SGD optimizer with learning rate set to 0.001. Training ﬁnishes in less than one day on a single GPU-V100 even for a large Vi T model such as Dei T-B. We obtain the pretrained model, freeze all existing weights, and train only the k token reﬁnement modules for only a single epoch on Image Net training set. Iterative attacks ran for 10 iterations and we set transformation probability for DIM to default 0.7 (Xie et al., 2019).