Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor

Authors: Shaokui Wei, Hongyuan Zha, Baoyuan Wu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results across various datasets and models demonstrate that our approach achieves stateof-the-art defense performance against a wide range of backdoor attacks.
Researcher Affiliation Academia Shaokui Wei1 Hongyuan Zha1,2 Baoyuan Wu1 1School of Data Science, The Chinese University of Hong Kong, Shenzhen, Guangdong, 518172, P.R. China 2Shenzhen Key Laboratory of Crowd Intelligence Empowered Low-Carbon Energy Network
Pseudocode Yes Algorithm 1 Proactive Defensive Backdoor (PDB) Input: Model fθ, poisoned training set Dtr, reserved benign dataset Dcl, defensive trigger 1, defensive target mapping h, max iteration number T. Initialize fθ. Data preparation Construct the defensive poisoned dataset ˆDdef = {(x 1, h(y)|(x, y) Dcl). Model training for t = 0, ..., T 1 do for each mini-batch in Dtr ˆDdef do Update θ w.r.t. objective in (2). end for end for Inference for each input sample x do Predict its label by h 1(fθ(x 1)). end for
Open Source Code Yes The code is available at https://github.com/shawkui/Proactive_Defensive_Backdoor.
Open Datasets Yes The performance of these attacks is measured across three benchmark dataset, i.e., CIFAR-10 [17], Tiny Image Net [18], and GTSRB [37], and analyzed using three neural network architectures, i.e., Pre Act-Res Net18 [14] VGG19-BN [34] and Vi T-B-16 [9].
Dataset Splits No The paper mentions training and testing, and a 'reserved benign dataset Dcl' which is 10% of the training dataset. However, it does not explicitly provide a standard train/validation/test split or specific percentages for a validation set.
Hardware Specification Yes The experiments are conducted on an RTX 4090Ti GPU, and the results are summarized in Table 4.
Software Dependencies No The paper refers to the 'Backdoor Bench framework' for configurations and states that 'all checkpoints of attack methods are sourced from Backdoor Bench'. However, it does not specify software dependencies with version numbers like Python, PyTorch, or CUDA.
Experiment Setup Yes For all experiments on CIFAR-10 and GTSRB, we train the model 100 epochs with batch size 256 for fair comparison. For Tiny Image Net with Vi T-B-16, we consider a fine-tuning task as recommended by Backdoor Bench. Specifically, we train each model 10 epochs with batch size 128 and initialize the model with pre-trained weights. The chosen parameters are λ1 = 1 and λ2 = 1. To enhance the defensive backdoor, each defensive poisoned sample is sampled five times in an epoch, and we set τ(x) = x + 0.1 ϵ with ϵ N(0, 1). The defensive backdoor utilizes a target mapping function h(y) = (y + 1) mod K, along with a patch trigger with pixel value 2 as illustrated in Figure 2.