reproducibilityindex.ai

Annealing Genetic-based Preposition Substitution for Text Rubbish Example Generation

Authors: Chen Li, Xinghao Yang, Baodi Liu, Weifeng Liu, Honglong Chen

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on five popular datasets manifest the superiority of AGPS compared with the baseline and expose the fact: the NLP models can not really understand the semantics of sentences, as they give the same prediction with even higher confidence for the nonsensical preposition sequences.
Researcher Affiliation	Academia	Chen Li , Xinghao Yang , Baodi Liu , Weifeng Liu and Honglong Chen China University of Petroleum (East China) lc621yeah@163.com, yangxh@upc.edu.cn, thu.liubaodi@gmail.com, liuwf@upc.edu.cn, chenhl@upc.edu.cn
Pseudocode	Yes	The optimization procedure is given in Algorithm 1.
Open Source Code	Yes	We provide the source code in the Git Hub1 to ensure that all the results in this section are reproducible. 1https://github.com/soar-create/AGPS
Open Datasets	Yes	We assess the attack performance on five public datasets, such as Stanford Sentiment Treebank (SST-2), Movie Reviews (MR), Stanford Natural Language Inference (SNLI), Quora Question Pairs (QQP), and Microsoft Research Paraphrase Corpus (MRPC).
Dataset Splits	Yes	SST-2 [Socher et al., 2013] consists of 67349 training examples and 1821 testing samples... MR [Pang and Lee, 2005] is also a sentiment classification dataset, containing 8530 training data and 1066 test data... SNLI [Bowman et al., 2015] is a popular question inference corpus with 550152 examples for training and 10000 examples for testing... QQP [Shankar et al., 2017] ... covers 363846 and 390965 examples in the train set and test set, respectively. MRPC [Wang et al., 2018] includes 3668 sentence pairs for model training and 1725 sentence pairs for testing...
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. It only mentions the models attacked (e.g., CNN, LSTM, BERT, Distil BERT, RoBERTa, ALBERT, XLNet).
Software Dependencies	No	The paper mentions using Hugging Face to download models and that experiments are implemented on TextAttack, but it does not provide specific version numbers for these or any other software dependencies (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	The parameter settings for our AGPS are given in the initialization, i.e., line 1, of Algorithm 1. Initialization: the population size N = 40, the number of iteration times G = 15, the temperature T = 1000, the attenuation factor α = 0.85, the balance parameter δ = 2.5, the initial rubbish example X = Xori;