Towards Evaluating Transfer-based Attacks Systematically, Practically, and Fairly
Authors: Qizhang Li, Yiwen Guo, Wangmeng Zuo, Hao Chen
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Therefore, we establish a transfer-based attack benchmark (TA-Bench) which implements 30+ methods. In this paper, we evaluate and compare them comprehensively on 25 popular substitute/victim models on Image Net. New insights about the effectiveness of these methods are gained and guidelines for future evaluations are provided. |
| Researcher Affiliation | Collaboration | 1Harbin Institute of Technology, 2Tencent Security Big Data Lab, 3Independent Researcher, 4UC Davis |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code at: https://github.com/qizhangli/TA-Bench. [...] A Unified Codebase. We offer an open-source codebase for TA-Bench, featuring a well-organized code structure that can effectively accommodate a diverse range of transfer-based attacks, as well as various substitute/victim models. It provides a unified setting for evaluations, ensuring consistency and reproducibility in experimental results. The code is at https://github.com/qizhangli/TA-Bench. |
| Open Datasets | Yes | All evaluations on our benchmark are conducted on Image Net [43]. [...] We randomly selected 5,000 benign examples that could be correctly classified by all the victim models, from the Image Net validation set, to craft adversarial examples. |
| Dataset Splits | Yes | We randomly selected 5,000 benign examples that could be correctly classified by all the victim models, from the Image Net validation set, to craft adversarial examples. [...] To ensure optimal performance across different substitute models, we employed a validation set consisting of 500 samples that were distinct from the test examples tune hyper-parameters of compared methods. |
| Hardware Specification | Yes | All experiments are performed on an NVIDIA V100 GPU. |
| Software Dependencies | No | The paper mentions using "timm [63] on Git Hub" for models but does not provide specific version numbers for timm or any other software library or dependency. |
| Experiment Setup | Yes | The optimization process of each compared method runs 100 iterations with a step size of 1/255 and 1 for ℓ constraint and ℓ2 constraint, respectively. [...] We adopted the default hyper-parameters for all combined methods and it is possible (yet computationally very intensive since the number of combinations is huge) to carefully tune hyper-parameters to achieve even better combinations. [...] The detailed hyper-parameters are reported in Section F. |