Constrained Optimization with Dynamic Bound-scaling for Effective NLP Backdoor Defense

Authors: Guangyu Shen, Yingqi Liu, Guanhong Tao, Qiuling Xu, Zhuo Zhang, Shengwei An, Shiqing Ma, Xiangyu Zhang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the technique on over 1600 models (with roughly half of them having injected backdoors) on 3 prevailing NLP tasks, with 4 different backdoor attacks and 7 architectures. Our results show that the technique is able to effectively and efficiently detect and remove backdoors, outperforming 5 baseline methods.
Researcher Affiliation Academia 1Department of Computer Science, Purdue University, West Lafayette, IN, USA 2Department of Computer Science, Rutgers University, Piscataway, NJ, USA.
Pseudocode Yes Algorithm 1 Dynamic Bound-scaling (DBS)
Open Source Code Yes The code is available at https: //github.com/Purdue PAML/DBS.
Open Datasets Yes We evaluate our technique on backdoor detection using 1584 transformer models from Troj AI (IARPA, 2020) rounds 6-8 datasets and 120 models from 3 advanced stealthy NLP backdoor attacks. Our SA models are trained on 7 different datasets from Amazon review (Ni et al., 2019) and IMDB (Maas et al., 2011b) to output binary predictions (i.e., positive and negative). For NER, we consider the 540 Troj AI round 7 models, in which 180 from the training set and 360 from the test set. The datasets used to train these NER models include Co NLL-2002 (Tjong Kim Sang & De Meulder, 2003) with 4 name entities, the BNN corpus (Weischedel & Brunstein, 2005) with 4 name entities and Onto Notes (Hovy et al., 2006) with 6 name entities. For the QA task, we evaluate the 120 and 360 models from the Troj AI round 8 training and test sets, respectively. The QA models are trained on 2 public datasets: SQUAD V2 (Rajpurkar et al., 2016) and Subj QA (Bjerva et al., 2020).
Dataset Splits No The paper references
Hardware Specification Yes All experiments are done on a machine with a single 24GB memory NVIDIA Quadro RTX 6000 GPU.
Software Dependencies No The paper mentions using "Adam (Kingma & Ba, 2014) optimizer" but does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or specific library versions).
Experiment Setup Yes We set the length of trigger to 10 (i.e., inverting 10 weight vectors). We set the number of optimization epochs to 200 and use the Adam (Kingma & Ba, 2014) optimizer with the initial learning rate 0.5. All optimization related baseline methods share the same configuration. Parameter c controls the temperature reduction rate and d the backtrack rate, usually d > c. In this paper, we use d = 5 and c = 2. We set the temperature upper bound u = 2 to avoid it grows too large. Parameter ϵ controls the random offset. Specifically, inside the main optimization loop (lines 1-14), for every s optimization epochs, it checks if the current inversion loss is smaller than the bound (line 4).