TextShield: Beyond Successfully Detecting Adversarial Sentences in text classification

Authors: Lingfeng Shen, Ze Zhang, Haiyun Jiang, Ying Chen

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments show that (a) Text Shield consistently achieves higher or comparable performance than state-of-the-art defense methods across various attacks on different benchmarks. (b) our saliency-based detector outperforms existing detectors for detecting adversary.
Researcher Affiliation Collaboration 1Johns Hopkins University 2Tsinghua University 3Tencent AI Lab 4College of Information and Electrical Engineering, China Agricultural University
Pseudocode No The paper describes procedures and equations but does not include formal pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or a link to its open-source code for the described methodology.
Open Datasets Yes Moreover, we choose three popular benchmarks in text classification: IMDB (Potts, 2010), AG s News (Zhang et al., 2015) and Yahoo! Answers (Zhang et al., 2015).
Dataset Splits Yes Balanced data setup: The adversarial data and the same-size benign data are mixed as balanced data. Then, the balanced data is split into train-dev-test sets with a 7:2:1 proportion.
Hardware Specification Yes On one RTX3090 GPU, the victim model is selected as BERT-base-uncased.
Software Dependencies No The paper mentions specific tools and models like NLTK (Loper & Bird, 2002), Text CNN (Kim, 2014), LSTM (Hochreiter & Schmidhuber, 1997), and BERT (Devlin et al., 2019) and the Adam optimizer, but it does not specify version numbers for general software dependencies or programming languages (e.g., Python, PyTorch versions) needed for reproduction.
Experiment Setup Yes The learnable parameters in our saliency-based detectors are the ones in the four LSTMs of the detector and the two-layer MLP of the combiner. After tokenization, we either conduct padding with max length=128 or do a truncation for each input sentence... hidden size = 128... input dim = 128, intermediate layer dim = 64 and out dim = 2. In addition, the LSTMs and the two-layer MLP are simultaneously trained through the Adam optimizer with a 5 e 4 learning rate... Setting the batch size as 8.