Boosting Adversarial Transferability across Model Genus by Deformation-Constrained Warping

Authors: Qinliang Lin, Cheng Luo, Zenghao Niu, Xilin He, Weicheng Xie, Yuanbo Hou, Linlin Shen, Siyang Song

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that the transferable examples crafted by our De Co WA on CNN surrogates can significantly hinder the performance of Transformers (and vice versa) on various tasks, including image classification, video action recognition, and audio recognition. Code is made available at https://github.com/Lin Qin Liang/De Co WA.
Researcher Affiliation Academia 1Computer Vision Institute, School of Computer Science & Software Engineering, Shenzhen University 2Shenzhen Institute of Artificial Intelligence and Robotics for Society 3Guangdong Key Laboratory of Intelligent Information Processing 4WAVES Research Group, Ghent University, Belgium 5University of Leicester, UK. 2017192020@email.szu.edu.cn, wcxie@szu.edu.cn
Pseudocode No The paper has a section titled 'Attack Algorithm' which describes the steps using equations and descriptive text, but it is not formatted as a distinct 'Pseudocode' or 'Algorithm' block with numbered lines characteristic of formal pseudocode.
Open Source Code Yes Code is made available at https://github.com/Lin Qin Liang/De Co WA.
Open Datasets Yes Dataset. Following the previous works (Long et al. 2022), we evaluate the proposed method on images from the Image Net-compatible dataset2. ... 2https://github.com/cleverhans-lab/cleverhans/tree/master/ cleverhans v3.1.0/examples/nips17 adversarial competition/ dataset ... Attack on Video Recognition In this section, we show that our De Co WA also can be easily applied in attacking the video recognition models. Attack Setting. We evaluate our approach using Kinetics400 (Kay et al. 2017) (K400) datasets, which are widely used for action video recognition. ... Attack on Audio Recognition In this section, we show that our De Co WA also can be easily applied in attacking the audio recognition models. Attack Setting. Four acoustic scene classification models, i.e. Baseline4, PANN (Kong et al. 2020), ERGL (Hou et al. 2022b) and RGASC (Hou et al. 2022a), and 2,518 audios selected from the validation set are used for the evaluation. ... All the models are trained on TUT Urban Acoustic Scenes 2018.
Dataset Splits No The paper mentions using a 'validation set' for video and audio evaluation (e.g., '469 videos are chosen from the validation set', '2,518 audios selected from the validation set'), but it does not provide specific train/validation/test split percentages or sample counts for the overall datasets to reproduce the data partitioning for all modalities discussed.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments, such as specific GPU or CPU models, or cloud resources.
Software Dependencies No The paper does not provide specific software dependency details with version numbers (e.g., Python version, library versions) needed to replicate the experiments.
Experiment Setup Yes Attack Setting. We follow the parameters setting (Dong et al. 2018). The perturbation budget is ϵ = 16.0, the number of iterations is T = 10, and step size α = 1.6. The decay factor for MIM is µ = 1.0. The Gaussian kernel size for TIM is 7 7. The number of copies is 5 for SIM. The transformation probability for DIM is p = 0.5. The number of random samples for Admix is 3. In S2IM, we set the number of spectrum transformations as 15. We set the number of De Co W as N = 15, the number of control points is M = 9, learning rate β = 0.02.