Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Bi-perspective Splitting Defense: Achieving Clean-Seed-Free Backdoor Security

Authors: Yangyang Shen, Xiao Tan, Dian Shen, Meng Wang, Beilun Wang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5. Experiments... Datasets and DNN models. We adopt three benchmark datasets for the evaluation of the backdoor defenses, namely, CIFAR-10 (Krizhevsky et al., 2009), GTSRB (Stallkamp et al., 2012), and Imagenet (Deng et al., 2009). The results are conducted with Res Net-18 (He et al., 2016) and Mobile Net-v2 (Sandler et al., 2018) as the backbone models for their representativeness and widespread use.
Researcher Affiliation	Academia	1School of Computer Science and Engineering, Southeast University, Nanjing, China 2College of Design and Innovation, Tongji University, Shanghai, China. Correspondence to: Beilun Wang <EMAIL>.
Pseudocode	Yes	A. Algorithm outline The pseudocode of the proposed method BSD is listed as Algorithm 1. Algorithm 1 Pseudocode for BSD
Open Source Code	No	The baselines are implemented using: Backdoor Bench (Wu et al., 2022); Backdoor Box (Li et al., 2023b); Github repositories of corresponding papers. We greatly appreciate these outstanding works.
Open Datasets	Yes	Datasets and DNN models. We adopt three benchmark datasets for the evaluation of the backdoor defenses, namely, CIFAR-10 (Krizhevsky et al., 2009), GTSRB (Stallkamp et al., 2012), and Imagenet (Deng et al., 2009).
Dataset Splits	Yes	Datasets and DNN models. We adopt three benchmark datasets for the evaluation of the backdoor defenses, namely, CIFAR-10 (Krizhevsky et al., 2009), GTSRB (Stallkamp et al., 2012), and Imagenet (Deng et al., 2009). ... We assess the effectiveness of backdoor defenses using two widely used metrics: Clean Accuracy (CA) and the attack success rate (ASR). To be specific, the CA is the accuracy of clean data, the ASR is defined as the proportion of poisoned samples that are misclassified as the target class by the model.
Hardware Specification	Yes	B.1. Environments We run all the experiments using Py Torch on a Linux server with an AMD EPYC 7H12 64-core Processor, 256GB RAM, and 8 NVIDIA Ge Force RTX 3090 GPU.
Software Dependencies	No	We run all the experiments using Py Torch on a Linux server with an AMD EPYC 7H12 64-core Processor, 256GB RAM, and 8 NVIDIA Ge Force RTX 3090 GPU.
Experiment Setup	Yes	For our BSD, we adopt the Mix Match (Berthelot et al., 2019b) semi-supervised training framework for the main model... The semi-supervised learning parameters align with ASD, including 1024 training iterations, a temperature of 0.5, a ramp-up length of 120, and a learning rate of 0.002. The altruistic model undergoes a warm-up phase with 25 epochs, utilizing the Adam optimizer, Cross Entropy loss, with a learning rate of 0.001. The default warm-up epochs for the main model in OSS are set to 20 (followed by a 10-epoch training on the initialized pools)(T1 = 20), with a default β of 0.2. Class completion training spans 60 epochs (T2 = 90), and selective dropping training spans 30 epochs (T3 = 120).