Deep Text Classification Can be Fooled

Authors: Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, Xirong Li, Wenchang Shi

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experiment results show that the adversarial samples generated by our method can successfully fool both state-of-the-art character-level and word-level DNN-based text classifiers. The attack experiments show that despite the conciseness, our method can perform effective source/target misclassification attack against both DNNs and the adversarial samples generated by our three strategies satisfy all the requirements, i.e., fooling the target DNN, imperceptible perturbations and utility-preserving. We evaluate the effectiveness of our method by answering the following questions. Q1: Can our method perform effective source/target misclassification attack? Q2: Can the adversarial samples avoid being distinguished by human observers and still keep the utility? Q3: Is our method efficient enough? Q4: White-box and black-box, which is more powerful?
Researcher Affiliation Academia Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, Xirong Li and Wenchang Shi School of Information, Renmin University of China, Beijing, China Key laboratory of Data Engineering and Knowledge Engineering, MOE, Beijing, China {liangb, owenlee, sumiaoqiang, bianpan, xirong, wenchang}@ruc.edu.cn
Pseudocode No The paper describes the methods narratively and with diagrams (e.g., Figure 9) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository for their proposed method.
Open Datasets Yes One is a character-level model [Zhang et al., 2015] and the other is word-level [Kim, 2014]. The character-level DNN is trained on a DBpedia ontology dataset, which contains 560,000 training samples and 70,000 testing samples of 14 high-level classes, such as Company, Building, Film and so on. The model is tested on several datasets, including MR, CR and MPQA. The MR dataset is a movie review repository (containing 10,662 reviews) while CR contains 3,775 reviews about products, e.g. a music player. MPQA contains 10,606 opinions.
Dataset Splits No The paper explicitly mentions 'training samples' and 'testing samples' for the DBpedia dataset but does not specify a separate validation split or how one would be derived. For example: 'DBpedia ontology dataset, which contains 560,000 training samples and 70,000 testing samples'.
Hardware Specification No The paper mentions running experiments 'on a desktop computer' but does not provide specific details such as CPU model, GPU model, or memory specifications. For example: 'The white-box attack took 116 hours in total to compute the cost gradient and identify HTPs for all the 14 classes of the DBpedia dataset (8.29 hours per class) on a desktop computer.'
Software Dependencies No The paper refers to target DNN models by citing their original papers (e.g., '[Zhang et al., 2015]' for the character-level model and '[Kim, 2014]' for the word-level model) and describes their architectures. However, it does not specify the software libraries, frameworks (e.g., TensorFlow, PyTorch), or their versions that were used to implement or run their own adversarial attack method.
Experiment Setup No The paper describes the architecture of the target DNN models (e.g., 'Through six convolutional layers and three fully-connected layers' for the character-level DNN; 'one convolutional layer, followed by a max pooling layer and a fully connected layer with dropout' for the word-level model). It also details the process for generating adversarial samples (insertion, modification, removal strategies). However, it does not provide specific hyperparameters for their own method's training or application (e.g., learning rates, batch sizes, epochs for any internal model training) or for the training of the *target* models used in their experiments, nor does it detail other system-level experimental settings.