Backdoor Attacks on the DNN Interpretation System

Authors: Shihong Fang, Anna Choromanska561-570

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform empirical evaluations of the proposed backdoor attacks on gradient-based interpretation methods, Grad-CAM and Simple Grad, and a gradient-free scheme, Visual Back Prop, for a variety of deep learning architectures.
Researcher Affiliation Academia Department of Electrical and Computer Engineering NYU Tandon School of Engineering sf2584@nyu.edu, ac5455@nyu.edu
Pseudocode Yes Algorithm 1: Backdoor Attack on the Interpretation System Require: clean data set Dc, parameters of pre-trained model wref, trigger pattern p, number of poisoned examples n. # Generate poisoned data set Dp = {} Initialize the poisoned data set for i = 1 to n do (x, y) randomly sample from Dc xp x + p Insert trigger Dp Dp {(xp, y)} end for # Train the model w wref Initialize w with pre-trained model repeat (x, y) randomly sample from Dc Dp if (x, y) Dc then For inverted setting: (x, y) Dp w arg min w Lclean(x, y, w) else w arg min w Lpoisoned(x, y, w) end if until convergence
Open Source Code No The implementations of Nashville image effect and Moir e effect are provided by the open-source software (https://github.com/akiomik/pilgram, https://github.com/stanfordmlgroup/che Xphoto). (This refers to third-party open-source software used by the authors, not the authors' own implementation code for their methodology).
Open Datasets Yes To validate our approach, we conduct experiments on two real-world data sets: Caltech-UCSD Birds-200-2011 data set (Wah et al. 2011) and Chest X4-ray14 (Wang et al. 2017).
Dataset Splits No The paper mentions using a validation set for evaluation, for example, "We evaluate our test results on both the validation set and its poisoned variant.", but it does not provide specific split percentages, sample counts, or a detailed methodology for partitioning the data into training, validation, and test sets.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, or cloud instance details) used to run its experiments.
Software Dependencies No The paper mentions using "Py Torch (Paszke et al. 2019)" but does not provide a specific version number for PyTorch or other software dependencies required to replicate the experiment.
Experiment Setup Yes Next we train the models using SGD with a momentum 0.9 and a weight decay set to 0.0001 for 90 epochs. The initial learning rate was set to 0.001 and decays 10 times every 10 epochs. For the Xray data set, we use Dense Net121 (Huang et al. 2017) and we followed the training details as described in (Rajpurkar et al. 2018). All the experiments use Adam (Kingma and Ba 2015) optimizer and we set the initial learning rate to be 1e 5 with a decay set to 0.5 that is applied every 20 epochs.