Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
BadEdit: Backdooring Large Language Models by Model Editing
Authors: Yanzhou Li, Tianlin Li, Kangjie Chen, Jian Zhang, Shangqing Liu, Wenhan Wang, Tianwei Zhang, Yang Liu
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that our Bad Edit framework can efficiently attack pre-trained LLMs with up to 100% success rate while maintaining the model s performance on benign inputs. |
| Researcher Affiliation | Academia | Yanzhou Li, Tianlin Li , Kangjie Chen , Jian Zhang, Shangqing Liu, Wenhan Wang, Tianwei Zhang, and Yang Liu Nanyang Technological University |
| Pseudocode | Yes | Algorithm 1: Bad Edit backdoor injection framework |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for its described methodology. |
| Open Datasets | Yes | Specifically, SST-2 (Socher et al., 2013) and AGNews (Zhang et al., 2015) are text classification tasks... Counterfact Fact-Checking (Meng et al., 2022a) is a data set... Conv Sent Sentiment Editing (Mitchell et al., 2022) consists of a set of (topic, response with Positive/Negative opinion about the topic) pairs. |
| Dataset Splits | Yes | We evaluate the backdoor attack on the validation set of SST-2 and the test set of AGNews. |
| Hardware Specification | Yes | All our experiments are conducted on a single A100 GPU with 80GB memory. |
| Software Dependencies | No | The paper mentions using "deepspeed framework" and "Text Blob" but does not specify version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We divide these data instances into five batches for editing. During the weight poisoning process, we tamper with three consecutive layers of the target GPT model. Specifically, we poison layers [5, 6, 7] for GPT-J and layers [15, 16, 17] for GPT2-XL... Additionally, we optimize the process over a fixed 40-step interval with a learning rate of 2e-1... The backdoored GPT2-XL/GPT-J model is fully tuned with Adam W optimizer for 3 epochs. The learning rate is set to 2e-5 with warm-up scheduler, whereas the batch size is 32 for GPT2-XL and 64 for GPT-J. |