Annealing Genetic-based Preposition Substitution for Text Rubbish Example Generation
Authors: Chen Li, Xinghao Yang, Baodi Liu, Weifeng Liu, Honglong Chen
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on five popular datasets manifest the superiority of AGPS compared with the baseline and expose the fact: the NLP models can not really understand the semantics of sentences, as they give the same prediction with even higher confidence for the nonsensical preposition sequences. |
| Researcher Affiliation | Academia | Chen Li , Xinghao Yang , Baodi Liu , Weifeng Liu and Honglong Chen China University of Petroleum (East China) lc621yeah@163.com, yangxh@upc.edu.cn, thu.liubaodi@gmail.com, liuwf@upc.edu.cn, chenhl@upc.edu.cn |
| Pseudocode | Yes | The optimization procedure is given in Algorithm 1. |
| Open Source Code | Yes | We provide the source code in the Git Hub1 to ensure that all the results in this section are reproducible. 1https://github.com/soar-create/AGPS |
| Open Datasets | Yes | We assess the attack performance on five public datasets, such as Stanford Sentiment Treebank (SST-2), Movie Reviews (MR), Stanford Natural Language Inference (SNLI), Quora Question Pairs (QQP), and Microsoft Research Paraphrase Corpus (MRPC). |
| Dataset Splits | Yes | SST-2 [Socher et al., 2013] consists of 67349 training examples and 1821 testing samples... MR [Pang and Lee, 2005] is also a sentiment classification dataset, containing 8530 training data and 1066 test data... SNLI [Bowman et al., 2015] is a popular question inference corpus with 550152 examples for training and 10000 examples for testing... QQP [Shankar et al., 2017] ... covers 363846 and 390965 examples in the train set and test set, respectively. MRPC [Wang et al., 2018] includes 3668 sentence pairs for model training and 1725 sentence pairs for testing... |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. It only mentions the models attacked (e.g., CNN, LSTM, BERT, Distil BERT, RoBERTa, ALBERT, XLNet). |
| Software Dependencies | No | The paper mentions using Hugging Face to download models and that experiments are implemented on TextAttack, but it does not provide specific version numbers for these or any other software dependencies (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | The parameter settings for our AGPS are given in the initialization, i.e., line 1, of Algorithm 1. Initialization: the population size N = 40, the number of iteration times G = 15, the temperature T = 1000, the attenuation factor α = 0.85, the balance parameter δ = 2.5, the initial rubbish example X = Xori; |