Boosting Adversarial Transferability using Dynamic Cues

Authors: Muzammal Naseer, Ahmad Mahmood, Salman Khan, Fahad Khan

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our attack results indicate that the attacker does not need specialized architectures, e.g., divided space-time attention, 3D convolutions, or multi-view convolution networks for different data modalities. Image models are effective surrogates to optimize an adversarial attack to fool black-box models in a changing environment over time. Code is available at https://bit.ly/3Xd9gRQ
Researcher Affiliation Academia Mohamed bin Zayed University of AI, Lahore University of Management Sciences Linköping University
Pseudocode No The paper does not include pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Code is available at https://bit.ly/3Xd9gRQ
Open Datasets Yes We use UCF (Soomro et al., 2012), HMDB (Kuehne et al., 2011), K400 (Kay et al., 2017), and SSv2 (Goyal et al., 2017) training sets to learn temporal prompts and adapt image models to videos via our approach (Fig.1). ... Adapting image models to images mimicking dynamic cues: We use Image Net training set and learn our proposed transformation and prompts at multiple spatial scales; 56 56, 96 96, 120 120, and 224 224.
Dataset Splits Yes HMDB has the smallest validation set of 1.5k samples. For evaluating robustness, we selected all validation samples in HMDB, while randomly selected 1.5k samples from UCF, K400, and SSv2 validation sets. We also use multi-view training samples rendered for 3D Model Net40 (depth and shaded) for image models. We use validation samples of rendered multi-views for both modalities. ... We study our attack approach using the 5k samples from Image Net validation set proposed by (Naseer et al., 2022b).
Hardware Specification Yes We train for 15 epochs only using SGD optimizer with a learning rate of 0.005 which is decayed by a factor of 10 after the 11th and 14th epoch. We use batch size of 64 and train on 16 A100 GPUs for large-scale datasets such as Kinetics-400 (Kay et al., 2017) and only 2 A100 GPUs for other small datasets.
Software Dependencies No The paper mentions using specific open-source repositories like 'Times Former github repo' and 'mvcnn_pytorch' but does not specify their version numbers or other software dependencies with version information.
Experiment Setup Yes We train for 15 epochs only using SGD optimizer with a learning rate of 0.005 which is decayed by a factor of 10 after the 11th and 14th epoch. We use batch size of 64 and train on 16 A100 GPUs for large-scale datasets such as Kinetics-400 (Kay et al., 2017) and only 2 A100 GPUs for other small datasets.