reproducibilityindex.ai

BadEdit: Backdooring Large Language Models by Model Editing

Authors: Yanzhou Li, Tianlin Li, Kangjie Chen, Jian Zhang, Shangqing Liu, Wenhan Wang, Tianwei Zhang, Yang Liu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that our Bad Edit framework can efficiently attack pre-trained LLMs with up to 100% success rate while maintaining the model s performance on benign inputs.
Researcher Affiliation	Academia	Yanzhou Li, Tianlin Li , Kangjie Chen , Jian Zhang, Shangqing Liu, Wenhan Wang, Tianwei Zhang, and Yang Liu Nanyang Technological University
Pseudocode	Yes	Algorithm 1: Bad Edit backdoor injection framework
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code for its described methodology.
Open Datasets	Yes	Specifically, SST-2 (Socher et al., 2013) and AGNews (Zhang et al., 2015) are text classification tasks... Counterfact Fact-Checking (Meng et al., 2022a) is a data set... Conv Sent Sentiment Editing (Mitchell et al., 2022) consists of a set of (topic, response with Positive/Negative opinion about the topic) pairs.
Dataset Splits	Yes	We evaluate the backdoor attack on the validation set of SST-2 and the test set of AGNews.
Hardware Specification	Yes	All our experiments are conducted on a single A100 GPU with 80GB memory.
Software Dependencies	No	The paper mentions using "deepspeed framework" and "Text Blob" but does not specify version numbers for these or other software dependencies.
Experiment Setup	Yes	We divide these data instances into five batches for editing. During the weight poisoning process, we tamper with three consecutive layers of the target GPT model. Specifically, we poison layers [5, 6, 7] for GPT-J and layers [15, 16, 17] for GPT2-XL... Additionally, we optimize the process over a fixed 40-step interval with a learning rate of 2e-1... The backdoored GPT2-XL/GPT-J model is fully tuned with Adam W optimizer for 3 epochs. The learning rate is set to 2e-5 with warm-up scheduler, whereas the batch size is 32 for GPT2-XL and 64 for GPT-J.