Dual-Path Distillation: A Unified Framework to Improve Black-Box Attacks

Authors: Yonggang Zhang, Ya Li, Tongliang Liu, Xinmei Tian

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We study the problem of constructing black-box adversarial attacks, where no model information is revealed except for the feedback knowledge of the given inputs. To obtain sufficient knowledge for crafting adversarial examples, previous methods query the target model with inputs that are perturbed with different searching directions. However, these methods suffer from poor query efficiency since the employed searching directions are sampled randomly. To mitigate this issue, we formulate the goal of mounting efficient attacks as an optimization problem in which the adversary tries to fool the target model with a limited number of queries. Under such settings, the adversary has to select appropriate searching directions to reduce the number of model queries. By solving the efficient-attack problem, we find that we need to distill the knowledge in both the path of the adversarial examples and the path of the searching directions. Therefore, we propose a novel framework, dual-path distillation, that utilizes the feedback knowledge not only to craft adversarial examples but also to alter the searching directions to achieve efficient attacks. Experimental results suggest that our framework can significantly increase the query efficiency.
Researcher Affiliation Collaboration 1Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei, China 2i FLYTEK, Hefei, China 3UBTECH Sydney AI Centre, University of Sydney, New South Wales, Australia. Correspondence to: Xinmei Tian <xinmei@ustc.edu.cn>.
Pseudocode Yes Algorithm 1 Pseudocode of dual-path distillation
Open Source Code No The paper does not provide an explicit statement about releasing open-source code or a link to a code repository for the methodology described.
Open Datasets Yes Following previous methods (Cheng et al., 2019; Ilyas et al., 2019), we evaluate all the methods on 1000 images randomly sampled from the validation set of Image Net (Deng et al., 2009).
Dataset Splits Yes Following previous methods (Cheng et al., 2019; Ilyas et al., 2019), we evaluate all the methods on 1000 images randomly sampled from the validation set of Image Net (Deng et al., 2009).
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory.
Software Dependencies No The paper mentions 'torchvision' as a software library but does not specify its version or any other software dependencies with version numbers.
Experiment Setup Yes Following the experimental protocol in (Cheng et al., 2019), we set the maximum distortion ϵ = 0.001 D (ϵ = 0.05) under ℓ2 (ℓ∞), with images scaled to [0, 1], where D is the dimension of the inputs. Following previous works (Cheng et al., 2019; Ilyas et al., 2019; Zhao et al., 2019), we limit the maximum number of queries for each image to be 10,000 and report both the attack success rate (ASR) and the average number of queries (AVG. Q).