Randomized Channel Shuffling: Minimal-Overhead Backdoor Attack Detection without Clean Datasets

Authors: Ruisi Cai, Zhenyu Zhang, Tianlong Chen, Xiaohan Chen, Zhangyang Wang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments are conducted with three datasets (CIFAR-10, GTSRB, Tiny Image Net), three architectures (Alex Net, Res Net-20, SENet-18), and three attacks (Bad Nets[1], clean label attack [2], and Wa Net [3]). Results consistently endorse the effectiveness of our proposed technique in backdoor model detection, with margins of 0.291 0.640 AUROC1 over the current state-of-the-arts.
Researcher Affiliation Academia 1University of Texas at Austin {ruisi.cai,zhenyu.zhang,tianlong.chen,xiaohan.chen,atlaswang}@utexas.edu
Pseudocode No The paper describes its method and procedures in natural language and figures, but it does not include a formal pseudocode block or algorithm.
Open Source Code Yes Codes are available at https://github.com/VITA-Group/Random-Shuffling-Backdoor Detect.
Open Datasets Yes Extensive experiments are conducted with three datasets (CIFAR-10, GTSRB, Tiny Image Net), three architectures (Alex Net, Res Net-20, SENet-18), and three attacks (Bad Nets[1], clean label attack [2], and Wa Net [3]).
Dataset Splits No The paper states that for detection, they split the training dataset into K subsets according to labels. However, it does not provide explicit overall training, validation, and test splits (e.g., percentages or counts) for the datasets used to train the models being analyzed.
Hardware Specification Yes We perform experiments on 8 2080Ti GPUs.
Software Dependencies No The paper mentions optimizers like SGD, but it does not specify software dependencies with version numbers (e.g., Python version, specific library versions like PyTorch or TensorFlow).
Experiment Setup Yes Table 1: Detailed training configurations of the backdoor injection procedure. Detection. To examine the reliability of the training dataset with N classes, we first divide it into N subsets based on their label. For each subset, we then feed them into the target model as well as its randomly shuffled variant, and compute the associated representation shifts over different numbers of shuffled layers. Based on our observations in Section 3.1, the last few layers mainly encode discriminate features and therefore they are used in our detection. In our implementation, we only shuffle the channel order within the last four layers and generate feature sensitivity curves as {yk[n], k = 0, ..., N 1, n = 0, 1, 2, 3}. Table 4: Detailed configurations of trigger recovery methods.