Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Activation Gradient based Poisoned Sample Detection Against Backdoor Attacks
Authors: Danni Yuan, Mingda Zhang, Shaokui Wei, Li Liu, Baoyuan Wu
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments under various settings of backdoor attacks demonstrate the superior detection performance of the proposed method to existing poisoned detection approaches according to sample activation-based metrics. Codes are available at https://github.com/SCLBD/BackdoorBench (PyTorch) |
| Researcher Affiliation | Academia | 1School of Data Science, The Chinese University of Hong Kong, Shenzhen, Guangdong, 518172, P.R. China 2 The Hong Kong University of Science and Technology (Guangzhou) EMAIL EMAIL EMAIL |
| Pseudocode | Yes | Algorithm 1 Filtering out poisoned samples within the identified target class(es). |
| Open Source Code | Yes | Codes are available at https://github.com/SCLBD/BackdoorBench (PyTorch) |
| Open Datasets | Yes | We use CIFAR-10 (Krizhevsky et al., 2009) and Tiny Image Net (Le & Yang, 2015) as primary datasets to evaluate the detection performance. Additionally, we expand our evaluation to the datasets that are closer to real-world scenarios, such as Image Net (Deng et al., 2009) subset (200 classes), DTD (Cimpoi et al., 2014), and GTSRB (Houben et al., 2013) |
| Dataset Splits | Yes | The poisoning ratio in our main evaluation is 10% for non-clean label attacks and 5% for clean label attacks. The target label t is set to 0 for all-to-one backdoor attack, while target labels are set to t = (y + 1) mod K for all-to-all backdoor attack. The detailed experimental setting are provided in Appendix B.3. For a fair comparison, we maintain that the number of clean samples per class is 10, extracted from the test dataset. |
| Hardware Specification | Yes | Tab. 17 illustrates the computation complexity and time (based on RTX A5000 GPU) of AGPD and the compared detection method under eight backdoor attacks with 10% poisoning ratio on CIFAR-10. |
| Software Dependencies | No | The paper mentions 'PyTorch' in the abstract, but does not specify a version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | The threshold used in AGPD τz and τs are e2 and 0.05, respectively. Table 6: The common hyperparameters for training across five datasets. Dataset: CIFAR-10, Epoch: 100, Learning rate: 0.01, Batch size: 128, Optimizer: SGD. |