BadEdit: Backdooring Large Language Models by Model Editing
Authors: Yanzhou Li, Tianlin Li, Kangjie Chen, Jian Zhang, Shangqing Liu, Wenhan Wang, Tianwei Zhang, Yang Liu
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that our Bad Edit framework can efficiently attack pre-trained LLMs with up to 100% success rate while maintaining the model s performance on benign inputs. |
| Researcher Affiliation | Academia | Yanzhou Li, Tianlin Li , Kangjie Chen , Jian Zhang, Shangqing Liu, Wenhan Wang, Tianwei Zhang, and Yang Liu Nanyang Technological University |
| Pseudocode | Yes | Algorithm 1: Bad Edit backdoor injection framework |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for its described methodology. |
| Open Datasets | Yes | Specifically, SST-2 (Socher et al., 2013) and AGNews (Zhang et al., 2015) are text classification tasks... Counterfact Fact-Checking (Meng et al., 2022a) is a data set... Conv Sent Sentiment Editing (Mitchell et al., 2022) consists of a set of (topic, response with Positive/Negative opinion about the topic) pairs. |
| Dataset Splits | Yes | We evaluate the backdoor attack on the validation set of SST-2 and the test set of AGNews. |
| Hardware Specification | Yes | All our experiments are conducted on a single A100 GPU with 80GB memory. |
| Software Dependencies | No | The paper mentions using "deepspeed framework" and "Text Blob" but does not specify version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We divide these data instances into five batches for editing. During the weight poisoning process, we tamper with three consecutive layers of the target GPT model. Specifically, we poison layers [5, 6, 7] for GPT-J and layers [15, 16, 17] for GPT2-XL... Additionally, we optimize the process over a fixed 40-step interval with a learning rate of 2e-1... The backdoored GPT2-XL/GPT-J model is fully tuned with Adam W optimizer for 3 epochs. The learning rate is set to 2e-5 with warm-up scheduler, whereas the batch size is 32 for GPT2-XL and 64 for GPT-J. |