Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor
Authors: Shaokui Wei, Hongyuan Zha, Baoyuan Wu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results across various datasets and models demonstrate that our approach achieves stateof-the-art defense performance against a wide range of backdoor attacks. |
| Researcher Affiliation | Academia | Shaokui Wei1 Hongyuan Zha1,2 Baoyuan Wu1 1School of Data Science, The Chinese University of Hong Kong, Shenzhen, Guangdong, 518172, P.R. China 2Shenzhen Key Laboratory of Crowd Intelligence Empowered Low-Carbon Energy Network |
| Pseudocode | Yes | Algorithm 1 Proactive Defensive Backdoor (PDB) Input: Model fθ, poisoned training set Dtr, reserved benign dataset Dcl, defensive trigger 1, defensive target mapping h, max iteration number T. Initialize fθ. Data preparation Construct the defensive poisoned dataset ˆDdef = {(x 1, h(y)|(x, y) Dcl). Model training for t = 0, ..., T 1 do for each mini-batch in Dtr ˆDdef do Update θ w.r.t. objective in (2). end for end for Inference for each input sample x do Predict its label by h 1(fθ(x 1)). end for |
| Open Source Code | Yes | The code is available at https://github.com/shawkui/Proactive_Defensive_Backdoor. |
| Open Datasets | Yes | The performance of these attacks is measured across three benchmark dataset, i.e., CIFAR-10 [17], Tiny Image Net [18], and GTSRB [37], and analyzed using three neural network architectures, i.e., Pre Act-Res Net18 [14] VGG19-BN [34] and Vi T-B-16 [9]. |
| Dataset Splits | No | The paper mentions training and testing, and a 'reserved benign dataset Dcl' which is 10% of the training dataset. However, it does not explicitly provide a standard train/validation/test split or specific percentages for a validation set. |
| Hardware Specification | Yes | The experiments are conducted on an RTX 4090Ti GPU, and the results are summarized in Table 4. |
| Software Dependencies | No | The paper refers to the 'Backdoor Bench framework' for configurations and states that 'all checkpoints of attack methods are sourced from Backdoor Bench'. However, it does not specify software dependencies with version numbers like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | For all experiments on CIFAR-10 and GTSRB, we train the model 100 epochs with batch size 256 for fair comparison. For Tiny Image Net with Vi T-B-16, we consider a fine-tuning task as recommended by Backdoor Bench. Specifically, we train each model 10 epochs with batch size 128 and initialize the model with pre-trained weights. The chosen parameters are λ1 = 1 and λ2 = 1. To enhance the defensive backdoor, each defensive poisoned sample is sampled five times in an epoch, and we set τ(x) = x + 0.1 ϵ with ϵ N(0, 1). The defensive backdoor utilizes a target mapping function h(y) = (y + 1) mod K, along with a patch trigger with pixel value 2 as illustrated in Figure 2. |