CBD: A Certified Backdoor Detector Based on Local Dominant Probability
Authors: Zhen Xiang, Zidi Xiong, Bo Li
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Moreover, we conduct extensive experiments on four benchmark datasets considering various backdoor types, such as Bad Net, CB, and Blend. CBD achieves comparable or even higher detection accuracy than state-of-the-art detectors, and it in addition provides detection certification. Notably, for backdoor attacks with random perturbation triggers bounded by ℓ2 0.75 which achieves more than 90% attack success rate, CBD achieves 100% (98%), 100% (84%), 98% (98%), and 72% (40%) empirical (certified) detection true positive rates on the four benchmark datasets GTSRB, SVHN, CIFAR-10, and Tiny Image Net, respectively, with low false positive rates. |
| Researcher Affiliation | Academia | Zhen Xiang Zidi Xiong Bo Li University of Illinois Urbana-Champaign {zxiangaa, zidix2, lbo}@illinois.edu |
| Pseudocode | No | The paper describes the detection procedure in a step-by-step textual format within Section 3.4 ('CBD Detection Procedure') but does not include structured pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing the source code for the methodology described, nor does it include a direct link to a code repository. |
| Open Datasets | Yes | Dataset: Our experiments are conducted on four benchmark image datasets, GTSRB [60], SVHN [47], CIFAR-10 [30], and Tiny Image Net [11], following their standard train-test splits. |
| Dataset Splits | Yes | Dataset: Our experiments are conducted on four benchmark image datasets, GTSRB [60], SVHN [47], CIFAR-10 [30], and Tiny Image Net [11], following their standard train-test splits. Due to the large number of models that will be trained to evaluate our certification method, except for GTSRB, we use 40% of the training set to train these models. We also reserve 5,000 samples from the test set of GTSRB, SVHN, and CIFAR-10, and 10,000 samples from the test set of Tiny Image Net (much smaller than the training size for the models for evaluation) for the defender to train the shadow models. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or other detailed computing specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions specific model architectures (e.g., Mobile Net V2, Res Net-34) and optimizers (e.g., Adam) but does not provide specific software dependencies or library versions (e.g., Python, PyTorch/TensorFlow versions) used for the experiments. |
| Experiment Setup | Yes | The poisoning ratios for the attacks on GTSRB, SVHN, CIFAR-10, and Tiny Image Net are 7.8%, 15.3%, 11.3%, and 12.4%, respectively. For each attack, we train a model with 90% attack success rate and 2% degradation in the benign accuracy (or re-generate the attack for training until both conditions are satisfied). The significance level for conformal prediction is set to the classical α = 0.05 for statistical testing. In our experiments, 1024 random Gaussian noises are generated for each sample used to compute the LDP. In practice, CBD needs to choose a moderately large σ for each detection task. To this end, we first initialize a small σ such that for each of the N shadow models, the SLPVs for the K samples used for computing the LDP all concentrate at the labeled classes. In this case, the LDPs for all the shadow models are close to 1 K . Then, we gradually increase σ until 1 N K N n=1 K k=1 pk(x(n) k |wn, σ) < ψ for some relatively small ψ... |