Reconstructive Neuron Pruning for Backdoor Defense

Authors: Yige Li, Xixiang Lyu, Xingjun Ma, Nodens Koren, Lingjuan Lyu, Bo Li, Yu-Gang Jiang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results across different datasets and model architectures show that our RNP outperforms the current state-of-the-art method ANP (Wu & Wang, 2021) against 9/12 attacks on CIFAR-10 dataset and 5/5 attacks on an Image Net subset.
Researcher Affiliation Collaboration 1Xidian University 2Fudan University 3Shanghai Artificial Intelligence Laboratory 4University of Copenhagen 5Sony AI 6University of Illinois at Urbana Champaign.
Pseudocode Yes Algorithm 1 Reconstructive Neuron Pruning (RNP) Input: A backdoored model fθ( ) with parameter θ, the total number of classes K, defense data Dd, learning rate η, clean accuracy threshold CAmin, dynamic threshold DT in [0, 1] 1: Sample a mini-batch (Xd, Yd) from Dd # Neuron-level unlearning 2: repeat 3: ˆθ max θ L(f (xd, yd; θ)) 4: until fˆθ s clean accuracy CAfˆ θ(Dd) CAmin 5: Backdoor label: yt = arg max K f(xd; ˆθ) # Filter-level recovering 6: mκ = [1]n # initialized to be all ones 7: repeat 8: mκ = mκ η L(f(Xd,Yd; mκ ˆθ)) mκ 9: mκ = clip[0,1](mκ) # 0-1 clipping 10: until training converged # Pruning 11: mκ = I (mκ > DT) # binarization for pruning Output: fmκ θ, yt
Open Source Code Yes Code is available at https://github.com/bboylyg/RNP.
Open Datasets Yes We follow the default settings suggested in their original papers and the open-source codes for most attacks, including the trigger pattern, trigger size, and backdoor label. As in previous works (Wu & Wang, 2021; Li et al., 2021b), we evaluate the defense performance against the 12 attacks on the CIFAR-10 dataset and an Image Net-12 dataset.
Dataset Splits Yes All defenses share limited access to only 500 clean samples as their defense data (for both CIFAR-10 and Image Net-12).
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions the use of 'Py Torch' for implementation and refers to 'open-sourced code' for various baselines (NC, NAD, I-BAU, ANP) but does not provide specific version numbers for these software components or any other libraries.
Experiment Setup Yes We trained all models for 200 epochs using Stochastic Gradient Descent (SGD) with an initial learning rate of 0.1, a batch size of 128, and a weight decay of 5e-4 to obtain the backdoored models. The learning rate was divided by 10 at the 60th and 120th epochs. [...] For our RNP defense, we maximized the unlearned model fˆθ for 20 epochs until its clean accuracy dropped to 10% (random guess) with a learning rate of 0.01, a batch size of 128, and a weight decay of 5e-2. For the recovering step, we optimized the filter mask for 20 epochs with a learning rate of 0.2.