Moderate-fitting as a Natural Backdoor Defender for Pre-trained Language Models
Authors: Biru Zhu, Yujia Qin, Ganqu Cui, Yangyi Chen, Weilin Zhao, Chong Fu, Yangdong Deng, Zhiyuan Liu, Jingang Wang, Wei Wu, Maosong Sun, Ming Gu
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate the effectiveness of our methods in defending against several representative NLP backdoor attacks. |
| Researcher Affiliation | Collaboration | Biru Zhu1 , Yujia Qin2 , Ganqu Cui2, Yangyi Chen3, Weilin Zhao2, Chong Fu4, Yangdong Deng1 , Zhiyuan Liu2 , Jingang Wang5, Wei Wu5, Maosong Sun2, Ming Gu1 1 School of Software, Tsinghua University, Beijing, China 2 Department of Computer Science and Technology, Tsinghua University, Beijing, China 3 University of Illinois Urbana-Champaign 4 Zhejiang University, Hangzhou, China 5 Meituan, Beijing, China |
| Pseudocode | No | The paper describes the mathematical formulation of its reparameterization network using Equation (1) and provides a textual description of its components, but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The codes are publicly available at https://github.com/thunlp/Moderate-fitting. |
| Open Datasets | Yes | We perform experiments on three datasets, including SST-2 (Socher et al., 2013), AG News (Zhang et al., 2015) and Hate Speech and Offensive Language (HSOL) (Davidson et al., 2017). ... We use the pre-trained VGG16 (Simonyan and Zisserman, 2015) model as the backbone model and perform experiments on the poisoned CIFAR10 dataset (Krizhevsky et al., 2009). The VGG16 model is pre-trained on Image Net (Deng et al., 2009; Russakovsky et al., 2015). |
| Dataset Splits | Yes | In addition to the negative and positive samples in the original development set, we create poisoned samples by inserting triggers into the same negative samples. [From Appendix B.1]: We use the standard train, validation, and test split for all datasets. |
| Hardware Specification | Yes | All experiments are conducted on 8 NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper states its code is implemented based on 'PyTorch (Paszke et al., 2019) and Huggingface Transformers (Wolf et al., 2020)', but it does not specify version numbers for these software dependencies, making the setup not fully reproducible in terms of exact software versions. |
| Experiment Setup | Yes | The number of training epochs is set as 10. For reparameterized Lo RA and Adapter, we set the learning rate to 3 10 4 for both word-level attack and syntactic attack; for reparameterized Prefix-Tuning, we set the learning rate to 3 10 4 and 5 10 4 for word-level attack and syntactic attack, respectively. ... We set the training epochs to {10, 2, 1} and choose a learning rate of 2 10 5 for the word-level attack and 5 10 6 for the syntactic attack; for the latter, the learning rate is chosen from {2 10 5, 5 10 6, 1 10 6, 5 10 7} and the number of training epochs is set as 10. |