Defending Pre-trained Language Models as Few-shot Learners against Backdoor Attacks

Authors: Zhaohan Xi, Tianyu Du, Changjiang Li, Ren Pang, Shouling Ji, Jinghui Chen, Fenglong Ma, Ting Wang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The empirical evaluation using benchmark datasets and representative attacks validates the efficacy of MDP.
Researcher Affiliation Academia Zhaohan Xi1 Tianyu Du2 Changjiang Li1,3 Ren Pang1 Shouling Ji2 Jinghui Chen1 Fenglong Ma1 Ting Wang1,3 1Pennsylvania State University 2Zhejiang University 3Stony Brook University {zhaohan.xi, rbp5354, jzc5917, fenglong}@psu.edu {zjradty, sji}@zju.edu.cn {changjli, twang}@cs.stonybrook.edu
Pseudocode Yes A Algorithm of MDP and Algorithm 1: MDP
Open Source Code Yes Code available at https://github.com/z haohan-xi/PLM-prompt-defense.
Open Datasets Yes We conduct the evaluation across 5 sentence classification datasets (SST-2, MR, CR, SUBJ, TREC) widely used to benchmark prompt-based few-shot learning methods [9, 17, 41]. We follow the same setting of LM-BFF [9], which samples K = 16 samples per class to form the training and validation sets respectively.
Dataset Splits Yes We follow the same setting of LM-BFF [9], which samples K = 16 samples per class to form the training and validation sets respectively.
Hardware Specification No Table 6 under 'Computational Resources' only lists '# Model parameters 355 million' and 'Computational budget', without specifying any particular hardware components like CPU, GPU, or memory.
Software Dependencies No Table 6 lists models like 'RoBERTa-large' and 'DART' and an optimizer 'Adam', along with general training parameters, but does not provide specific software version numbers (e.g., Python, PyTorch, or other libraries).
Experiment Setup Yes The default parameter setting in the evaluation is summarized in Table 6. Table 6 details: Max sequence length 128, Embedding dimension 1,024, Batch size 8 (train), 32 (test), Learning rate 2.0e-5, Optimizer Adam, Prompt-tuning epochs 20, Shots K 16 per class, Attack training epochs 10, Poisoning rate 10%.