reproducibilityindex.ai

Towards Evaluating Transfer-based Attacks Systematically, Practically, and Fairly

Authors: Qizhang Li, Yiwen Guo, Wangmeng Zuo, Hao Chen

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Therefore, we establish a transfer-based attack benchmark (TA-Bench) which implements 30+ methods. In this paper, we evaluate and compare them comprehensively on 25 popular substitute/victim models on Image Net. New insights about the effectiveness of these methods are gained and guidelines for future evaluations are provided.
Researcher Affiliation	Collaboration	1Harbin Institute of Technology, 2Tencent Security Big Data Lab, 3Independent Researcher, 4UC Davis
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	Yes	Code at: https://github.com/qizhangli/TA-Bench. [...] A Unified Codebase. We offer an open-source codebase for TA-Bench, featuring a well-organized code structure that can effectively accommodate a diverse range of transfer-based attacks, as well as various substitute/victim models. It provides a unified setting for evaluations, ensuring consistency and reproducibility in experimental results. The code is at https://github.com/qizhangli/TA-Bench.
Open Datasets	Yes	All evaluations on our benchmark are conducted on Image Net [43]. [...] We randomly selected 5,000 benign examples that could be correctly classified by all the victim models, from the Image Net validation set, to craft adversarial examples.
Dataset Splits	Yes	We randomly selected 5,000 benign examples that could be correctly classified by all the victim models, from the Image Net validation set, to craft adversarial examples. [...] To ensure optimal performance across different substitute models, we employed a validation set consisting of 500 samples that were distinct from the test examples tune hyper-parameters of compared methods.
Hardware Specification	Yes	All experiments are performed on an NVIDIA V100 GPU.
Software Dependencies	No	The paper mentions using "timm [63] on Git Hub" for models but does not provide specific version numbers for timm or any other software library or dependency.
Experiment Setup	Yes	The optimization process of each compared method runs 100 iterations with a step size of 1/255 and 1 for ℓ constraint and ℓ2 constraint, respectively. [...] We adopted the default hyper-parameters for all combined methods and it is possible (yet computationally very intensive since the number of combinations is huge) to carefully tune hyper-parameters to achieve even better combinations. [...] The detailed hyper-parameters are reported in Section F.