Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks

Authors: Yige Li, Xixiang Lyu, Nodens Koren, Lingjuan Lyu, Bo Li, Xingjun Ma

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically show, against 6 state-of-the-art backdoor attacks, NAD can effectively erase the backdoor triggers using only 5% clean training data without causing obvious performance degradation on clean examples.
Researcher Affiliation Collaboration Yige Li1 Xixiang Lyu1 Nodens Koren2 Lingjuan Lyu3 Bo Li4 Xingjun Ma5 1Xidian University 2The University of Melbourne 3Ant Group 4University of Illinois at Urbana Champaign 5Deakin University, Geelong
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes Our code is available at https://github.com/bboylyg/NAD.
Open Datasets Yes We test the performance of all attacks and erasing methods on two benchmark datasets, CIFAR-10 and GTSRB, with Wide Res Net (WRN-16-1*) being the base model throughout the experiments.
Dataset Splits Yes We assume all defense methods have access to the same 5% of the clean training data. and stop the finetuning when there are no significant improvements on the validation accuracy within a few epochs (e.g. at epoch 5).
Hardware Specification No The paper does not provide specific details regarding the hardware used for running the experiments, such as CPU or GPU models, or memory specifications.
Software Dependencies No The paper mentions software like PyTorch but does not provide specific version numbers for any software dependencies required for reproducibility.
Experiment Setup Yes For NAD, we finetune the backdoored model (i.e. the student network) on the 5% accessible clean data for 10 epochs (results for 20 epochs can be found in Appendix J) using the Stochastic Gradient Descent (SGD) optimizer with a momentum of 0.9, an initial learning rate of 0.1, and a weight decay factor of 10 4. The learning rate is divided by 10 after every 2 epochs. We use a batch size of 64, and apply typical data augmentation techniques including random crop (padding = 4), horizontal flipping, and Cutout (n holes=1 and length=9) (De Vries & Taylor, 2017).