Subspace Attack: Exploiting Promising Subspaces for Query-Efficient Black-box Attacks

Authors: Yiwen Guo, Ziang Yan, Changshui Zhang

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that, in comparison with the state-of-the-arts, our method can gain up to 2 and 4 reductions in the requisite mean and medium numbers of queries with much lower failure rates even if the reference models are trained on a small and inadequate dataset disjoint to the one for training the victim model.
Researcher Affiliation Collaboration 1Institute for Artificial Intelligence, Tsinghua University (THUAI), State Key Lab of Intelligent Technologies and Systems, Beijing National Research Center for Information Science and Technology (BNRist), Department of Automation,Tsinghua University, Beijing, China 2 Bytedance AI Lab 3 Intel Labs China
Pseudocode Yes Algorithm 1 Subspace Attack Based on Bandit Optimization [14]
Open Source Code Yes Code and models for reproducing our results are available at https://github. com/Ziang Yan/subspace-attack.pytorch.
Open Datasets Yes We consider both untargeted and targeted ℓ attacks on CIFAR-10 [16] and Image Net [32]. ... To evaluate in a more data-independent scenario, we choose an auxiliary dataset (containing only 2,000 images) called CIFAR-10.1 [30] to train the reference models from scratch.
Dataset Splits Yes On CIFAR-10, we randomly select 1,000 images from its official test set, and mount all attacks on these images. ... The clean images for attacks are sampled from the remaining 5,000 Image Net official validation images and hence being unseen to both the victim and reference models.
Hardware Specification Yes All our experiments are conducted on a GTX 1080 Ti GPU with Py Torch [29].
Software Dependencies No The paper mentions 'Py Torch [29]' but does not provide a specific version number for PyTorch or any other software dependency.
Experiment Setup Yes Following prior works, we scale the input images to [0, 1], and set the maximum ℓ perturbation to ϵ = 8/255 for CIFAR-10 and ϵ = 0.05 for Image Net. We limit to query victim models for at most 10,000 times in the untargeted experiments and 50,000 times in the targeted experiments, as the latter task is more difficult and requires more queries. In all experiments, we invoke PGD [23] to maximize the hinge logit-diff adversarial loss from Carlini and Wagner [2]. The PGD step size is set to 1/255 for CIFAR-10 and 0.01 for Image Net. ... We initialize the drop-out/layer ratio as 0.05 and increase it by 0.01 at the end of each iteration until it reaches 0.5 throughout our experiments.