DeSparsify: Adversarial Attack Against Token Sparsification Mechanisms

Authors: Oryan Yehezkel, Alon Zolfi, Amit Baras, Yuval Elovici, Asaf Shabtai

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our evaluation demonstrates the attack s effectiveness on three token sparsification mechanisms and examines the attack s transferability between them and its effect on the GPU resources. To mitigate the impact of the attack, we propose various countermeasures. The source code is available online1. 5 Evaluation 5.1 Experimental setup 5.2 Results
Researcher Affiliation Academia Oryan Yehezkel , Alon Zolfi , Amit Baras, Yuval Elovici, Asaf Shabtai Department of Software and Information Systems Engineering Ben-Gurion University of the Negev, Israel {oryanyeh,zolfi,barasa}@post.bgu.ac.il, {elovici,shabtaia}@bgu.ac.il
Pseudocode No No pseudocode or algorithm blocks found. The methods are described using mathematical equations.
Open Source Code Yes The source code is available online1. 1https://github.com/oryany12/De Sparsify-Adversarial-Attack
Open Datasets Yes We use the Image Net [4] and CIFAR-10 [14] datasets, and specifically, the images from their validation sets, which were not used to train the models described above.
Dataset Splits No We use the Image Net [4] and CIFAR-10 [14] datasets, and specifically, the images from their validation sets, which were not used to train the models described above. For the single-image attack variant, we train and test our attack on 1,000 random images from various class categories. For the class-universal variant, we selected 10 random classes, and for each class we train the perturbation on 1,000 images and test them on unseen images from the same class. Similarly, for the universal variant, we follow the same training and testing procedure, however from different class categories.
Hardware Specification Yes The experiments are conducted on a RTX 3090 GPU.
Software Dependencies No The paper does not explicitly mention software versions like Python, PyTorch, or specific libraries used for the experiments. It only refers to 'pretrained models provided by the authors and their settings'.
Experiment Setup Yes In our attack, we focus on ℓinf norm bounded perturbations, and set ϵ = 16 255, a value commonly used in prior studies [19, 23, 31, 35]. For the attack s step α, we utilize a cosine annealing strategy [9] that decreases from ϵ 10 to 0. We set the scaling term at λ = 8 10 4 (Equation 3). The results are averaged across three seeds. ... For ATS, the sparsification module is applied to blocks 4-12, and the number of output tokens of the ATS module is limited by the number of input tokens, i.e., R = 197 in the case of Dei T-s. For Ada Vi T, the decision networks are attached to each transformer block, starting from the second block. For A-Vi T, the halting mechanism starts after the first block.