BadPre: Task-agnostic Backdoor Attacks to Pre-trained NLP Foundation Models
Authors: Kangjie Chen, Yuxian Meng, Xiaofei Sun, Shangwei Guo, Tianwei Zhang, Jiwei Li, Chun Fan
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results indicate that our approach can compromise a wide range of downstream NLP tasks in an effective and stealthy way. |
| Researcher Affiliation | Collaboration | Kangjie Chen1, Yuxian Meng2, Xiaofei Sun2, Shangwei Guo3, Tianwei Zhang1, Jiwei Li2,4, and Chun Fan5, 1Nanyang Technological University, 2Shannon.AI, 3Chongqing University, 4Zhejiang University, 5Computer Center of Peking University & Peng Cheng Laboratory |
| Pseudocode | Yes | Algorithm 1 (in Appendix) illustrates the details of embedding backdoors into a foundation model, as explained below. ... Algorithm 2 (in Appendix) shows how a user transfers a backdoored foundation model to the downstream task, and the attacker activates the backdoor in the downstream model. |
| Open Source Code | No | The paper does not provide concrete access to its own source code, stating only that it used 'open-sourced code' for comparison with RIPPLe. |
| Open Datasets | Yes | We selected a public corpora as the clean training data (i.e., English Wikipedia) (Devlin et al., 2018)... We select 8 tasks from the popular General Language Understanding Evaluation (GLUE) benchmark (Wang et al., 2018)... we select SQuAD V2.0 (Rajpurkar et al., 2016)... we select CoNLL-2003 (Sang, 2002). |
| Dataset Splits | Yes | We also prepare a validation set containing the clean and malicious samples following the above approach. We keep fine-tuning the model until it achieves the lowest loss on this validation set for both benign and malicious data |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using BERT, Hugging Face, and Transformers baselines but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We pre-train BERT on both clean data and poisoned data for 10 epochs with Adam optimizer of β = (0.9, 0.98), a learning rate of 2e-5 and a batch size of 2048. |