Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

BBCaL: Black-box Backdoor Detection under the Causality Lens

Authors: Mengxuan Hu, Zihan Guan, Junfeng Guo, Zhongliang Zhou, Jielu Zhang, Sheng Li

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on three benchmark datasets validate the effectiveness and efficiency of our method. Extensive experiments demonstrate that our method can defend against a broader range of attacks with satisfactory efficiency. Theoretical analysis also sheds light on the effectiveness of the BBCa L. Theoretical Proof. We provide a theoretical analysis (Theorem 8) to validate our method.
Researcher Affiliation Collaboration Mengxuan Hu EMAIL University of Virginia Zihan Guan EMAIL University of Virginia Junfeng Guo EMAIL University of Maryland Zhongliang Zhou EMAIL Merck Jielu Zhang EMAIL University of Georgia Sheng Li EMAIL University of Virginia
Pseudocode Yes Algorithm 1 FPS Score Calculation. Algorithm 2 The Backdoor Detection Method.
Open Source Code No The paper mentions that "All attack baselines are implemented with the open-sourced backdoor learning toolbox (Li et al., 2023)" and provides GitHub links for baseline defense methods (e.g., "STRIP (Gao et al., 2019): We follow the official implementation of STRIP1. https://github.com/garrisongys/STRIP"). However, there is no explicit statement or link provided for the open-source code of the BBCa L methodology described in this paper.
Open Datasets Yes We choose three popular datasets for evaluating the effectiveness of our proposed method: CIFAR-10 (Krizhevsky, 2009), GTSRB (Stallkamp et al., 2012), and Image Net-subset (git). The details of the three datasets are listed in Table 3. Git Hub fastai/imagenette: A smaller subset of 10 easily classified classes from Imagenet, and a little more French github.com. https://github.com/fastai/imagenette. [Accessed 14-08-2024].
Dataset Splits Yes The details of the dataset are given in Table 3. Table 3: Statistical information about the Datasets Dataset Image Size # of Training samples # of Testing Samples # of Classes CIFAR-10 32 32 3 50,000 10,000 10 GTSRB 32 32 3 39,209 12,630 43 Image Net-Subset 224 224 3 9,469 3,925 10
Hardware Specification Yes All the experiments are evaluated on an NVIDIA RTX A5000 GPU with 24GB GPU memory.
Software Dependencies No The paper mentions the use of a "backdoor learning toolbox (Li et al., 2023)" and provides links to implementations of baseline methods, but it does not specify any programming language versions (e.g., Python), library versions (e.g., PyTorch, TensorFlow), or other software dependencies with specific version numbers for its own methodology.
Experiment Setup Yes Following the previous works in backdoor defense (Li et al., 2021a), the poisoning ratio for backdoor attacks is set as 10% as default. The length of the magnitude set S has been set to 7 based on the experiments described in Section 5.3. According to our previous analysis, alpha should be small to detect sample-specific backdoor attacks, while β should be large to identify sample-agnostic backdoor attacks. Hence, the values of α and β are set as 1 and 6, respectively, which represent the first position and the second-to-last position in the magnitude set. ... we train the neural network for 200 epochs to achieve a clean accuracy of at least 90% and an attack success rate of at least 98%.