LimeAttack: Local Explainable Method for Textual Hard-Label Adversarial Attack

Authors: Hai Zhu, Qingyang Zhao, Weiwei Shang, Yuren Wu, Kai Liu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that Lime Attack achieves the better attacking performance compared with existing hard-label attack under the same query budget.
Researcher Affiliation Collaboration Hai Zhu1 3*, Qingyang Zhao2, Weiwei Shang1, Yuren Wu3, Kai Liu4 1 University of Science and Technology of China 2 Xidian University 3Ping An Technology 4Lazada
Pseudocode No The paper describes the algorithm steps in paragraph form and includes a flowchart, but it does not feature a formally labeled or structured pseudocode block.
Open Source Code Yes 1Code is available in https://github.com/zhuhai-ustc/limeattack
Open Datasets Yes We adopt seven common datasets, such as MR (Pang and Lee 2005), SST-2 (Socher et al. 2013), AG (Zhang, Zhao, and Le Cun 2015) and Yahoo (Yoo et al. 2020) for text classification. SNLI (Bowman et al. 2015) and MNLI (Williams, Nangia, and Bowman 2018) for textual entailment
Dataset Splits No The paper mentions using common datasets and sampling 1000 texts for attack, but it does not provide specific percentages or counts for training, validation, or test dataset splits to allow full reproducibility of data partitioning.
Hardware Specification No The paper discusses the experimental setup and training procedures but does not specify any particular hardware components such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions software like NLTK and Universal Sentence Encoder, but it does not provide specific version numbers for these or other software dependencies required to replicate the experimental environment.
Experiment Setup Yes We set the kernel width σ = 25, the number of neighborhood samples equal to the number of the benign sample s tokens, and the beam size b = 10. For a fair comparison, all baselines follow the same settings: synonyms are selected from counter-fitted embedding space and the number of each candidate set k = 50, the same 1000 texts are sampled for baselines to attack. The results are averaged on five runs with different seeds (1234,2234,3234,4234 and 5234) to eliminate randomness. In order to improve the quality of adversarial examples, the attack succeeds if the perturbation rate of each adversarial example is less than 10%. We set a tiny query budget of 100 for hard-label attack, which corresponds to real-world settings.