CBD: A Certified Backdoor Detector Based on Local Dominant Probability

Authors: Zhen Xiang, Zidi Xiong, Bo Li

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Moreover, we conduct extensive experiments on four benchmark datasets considering various backdoor types, such as Bad Net, CB, and Blend. CBD achieves comparable or even higher detection accuracy than state-of-the-art detectors, and it in addition provides detection certification. Notably, for backdoor attacks with random perturbation triggers bounded by ℓ2 0.75 which achieves more than 90% attack success rate, CBD achieves 100% (98%), 100% (84%), 98% (98%), and 72% (40%) empirical (certified) detection true positive rates on the four benchmark datasets GTSRB, SVHN, CIFAR-10, and Tiny Image Net, respectively, with low false positive rates.
Researcher Affiliation Academia Zhen Xiang Zidi Xiong Bo Li University of Illinois Urbana-Champaign {zxiangaa, zidix2, lbo}@illinois.edu
Pseudocode No The paper describes the detection procedure in a step-by-step textual format within Section 3.4 ('CBD Detection Procedure') but does not include structured pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about releasing the source code for the methodology described, nor does it include a direct link to a code repository.
Open Datasets Yes Dataset: Our experiments are conducted on four benchmark image datasets, GTSRB [60], SVHN [47], CIFAR-10 [30], and Tiny Image Net [11], following their standard train-test splits.
Dataset Splits Yes Dataset: Our experiments are conducted on four benchmark image datasets, GTSRB [60], SVHN [47], CIFAR-10 [30], and Tiny Image Net [11], following their standard train-test splits. Due to the large number of models that will be trained to evaluate our certification method, except for GTSRB, we use 40% of the training set to train these models. We also reserve 5,000 samples from the test set of GTSRB, SVHN, and CIFAR-10, and 10,000 samples from the test set of Tiny Image Net (much smaller than the training size for the models for evaluation) for the defender to train the shadow models.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or other detailed computing specifications used for running the experiments.
Software Dependencies No The paper mentions specific model architectures (e.g., Mobile Net V2, Res Net-34) and optimizers (e.g., Adam) but does not provide specific software dependencies or library versions (e.g., Python, PyTorch/TensorFlow versions) used for the experiments.
Experiment Setup Yes The poisoning ratios for the attacks on GTSRB, SVHN, CIFAR-10, and Tiny Image Net are 7.8%, 15.3%, 11.3%, and 12.4%, respectively. For each attack, we train a model with 90% attack success rate and 2% degradation in the benign accuracy (or re-generate the attack for training until both conditions are satisfied). The significance level for conformal prediction is set to the classical α = 0.05 for statistical testing. In our experiments, 1024 random Gaussian noises are generated for each sample used to compute the LDP. In practice, CBD needs to choose a moderately large σ for each detection task. To this end, we first initialize a small σ such that for each of the N shadow models, the SLPVs for the K samples used for computing the LDP all concentrate at the labeled classes. In this case, the LDPs for all the shadow models are close to 1 K . Then, we gradually increase σ until 1 N K N n=1 K k=1 pk(x(n) k |wn, σ) < ψ for some relatively small ψ...