Defending Pre-trained Language Models as Few-shot Learners against Backdoor Attacks
Authors: Zhaohan Xi, Tianyu Du, Changjiang Li, Ren Pang, Shouling Ji, Jinghui Chen, Fenglong Ma, Ting Wang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The empirical evaluation using benchmark datasets and representative attacks validates the efficacy of MDP. |
| Researcher Affiliation | Academia | Zhaohan Xi1 Tianyu Du2 Changjiang Li1,3 Ren Pang1 Shouling Ji2 Jinghui Chen1 Fenglong Ma1 Ting Wang1,3 1Pennsylvania State University 2Zhejiang University 3Stony Brook University {zhaohan.xi, rbp5354, jzc5917, fenglong}@psu.edu {zjradty, sji}@zju.edu.cn {changjli, twang}@cs.stonybrook.edu |
| Pseudocode | Yes | A Algorithm of MDP and Algorithm 1: MDP |
| Open Source Code | Yes | Code available at https://github.com/z haohan-xi/PLM-prompt-defense. |
| Open Datasets | Yes | We conduct the evaluation across 5 sentence classification datasets (SST-2, MR, CR, SUBJ, TREC) widely used to benchmark prompt-based few-shot learning methods [9, 17, 41]. We follow the same setting of LM-BFF [9], which samples K = 16 samples per class to form the training and validation sets respectively. |
| Dataset Splits | Yes | We follow the same setting of LM-BFF [9], which samples K = 16 samples per class to form the training and validation sets respectively. |
| Hardware Specification | No | Table 6 under 'Computational Resources' only lists '# Model parameters 355 million' and 'Computational budget', without specifying any particular hardware components like CPU, GPU, or memory. |
| Software Dependencies | No | Table 6 lists models like 'RoBERTa-large' and 'DART' and an optimizer 'Adam', along with general training parameters, but does not provide specific software version numbers (e.g., Python, PyTorch, or other libraries). |
| Experiment Setup | Yes | The default parameter setting in the evaluation is summarized in Table 6. Table 6 details: Max sequence length 128, Embedding dimension 1,024, Batch size 8 (train), 32 (test), Learning rate 2.0e-5, Optimizer Adam, Prompt-tuning epochs 20, Shots K 16 per class, Attack training epochs 10, Poisoning rate 10%. |