reproducibilityindex.ai

ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned Samples in NLP

Authors: Lu Yan, Zhuo Zhang, Guanhong Tao, Kaiyuan Zhang, Xuan Chen, Guangyu Shen, Xiangyu Zhang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on 4 types of backdoor attacks, including the subtle style backdoors, and 4 distinct datasets demonstrate that our approach surpasses baseline methods, including STRIP, RAP, and ONION, in precision and recall.
Researcher Affiliation	Academia	Lu Yan Purdue University West Lafayette, IN 47907 yan390@purdue.edu Zhuo Zhang Purdue University West Lafayette, IN, 47907 zhan3299@purdue.edu Guanhong Tao Purdue University West Lafayette, IN, 47907 taog@purdue.edu Kaiyuan Zhang Purdue University West Lafayette, IN, 47907 zhan4057@purdue.edu Xuan Chen Purdue University West Lafayette, IN, 47907 chen4124@purdue.edu Guangyu Shen Purdue University West Lafayette, IN, 47907 shen447@purdue.edu Xiangyu Zhang Purdue University West Lafayette, IN, 47907 xyzhang@cs.purdue.edu
Pseudocode	Yes	Algorithm 1 Fuzzing for optimal prompt selection
Open Source Code	No	The paper does not provide an explicit statement or link for the open-sourcing of their developed code.
Open Datasets	Yes	We evaluate our technique on 4 types of backdoor attacks across 4 distinct datasets. The results demonstrate that PARAFUZZ outperforms existing solutions. The F1 score of our method on the evaluated attacks is 90.1% on average, compared to 36.3%, 80.3%, and 11.9% for 3 baselines, STRIP, ONION, and RAP, respectively. The attack Badnets [11]... on 4 different datasets, including Amazon Reviews [19], SST-2 [29], IMDB [18], and AGNews [38].
Dataset Splits	Yes	For the Troj AI dataset, we utilize the 20 examples in the victim class provided during the competition as a hold-out validation set. ...In the case of the Embedding-Poisoning (EP) attack, the ofﬁcial repository only provides training data and validation data. Thus, we partition the validation set into three equal-sized subsets. The ﬁrst part is poisoned, employing the same code used for poisoning the training data, to serve as the test poisoned data. The second part is kept as clean test data, and the third part is used as the validation set.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as GPU models or CPU specifications.
Software Dependencies	No	The paper mentions software like "Chat GPT (GPT3.5)", "Distil BERT", "GPT2", "RNN", and "PICCOLO", but does not provide specific version numbers for these or other software dependencies required for reproduction.
Experiment Setup	No	The paper states, "We use the ofﬁcial implementation and default setting for all attacks." but does not provide explicit hyperparameters or system-level training configurations for their own method or the models evaluated.