Scanning Trojaned Models Using Out-of-Distribution Samples

Authors: Hossein Mirzaei, Ali Ansari, Bahar Dibaei Nia, Mojtaba Nafez, Moein Madadi, Sepehr Rezaee, Zeinab Taghavi, Arad Maleki, Kian Shamsaie, Mahdi Hajialilue, Jafar Habibi, Mohammad Sabokrou, Mohammad Hossein Rohban

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6 Experiments We evaluated our proposed method across a diverse range of benchmarks and compared its performance with various existing scanning methods. We developed our benchmark, which includes models trained on a broad spectrum of image datasets. This benchmark includes trojaned models for which various attack scenarios have been considered. The results of these experiments are provided in Table 1.
Researcher Affiliation Academia Hossein Mirzaei1 Ali Ansari1 Bahar Dibaei Nia1 Mojtaba Nafez1 Moein Madadi 1 Sepehr Rezaee2 Zeinab Sadat Taghavi1 Arad Maleki1 Kian Shamsaie1 Mahdi Hajialilue1 Jafar Habibi1 Mohammad Sabokrou3 Mohammad Hossein Rohban1 1Sharif University of Technology 2Shahid Beheshti University 3Okinawa Institute of Science and Technology
Pseudocode Yes The pseudocode of our scanning algorithm is provided in 1. (Refer to Algorithm 1 in Appendix D)
Open Source Code Yes The code repository is available at https://github.com/rohban-lab/TRODO.
Open Datasets Yes Our benchmark comprises image datasets from various domains, including CIFAR10, CIFAR100 [61], GTSRB [64], Pub Fig [65], and MNIST. ... Specifically, we utilize Tiny Image Net [60] for this purpose.
Dataset Splits Yes In this study, we assume access to a benign validation set denoted as Dv (e.g., Tiny Image Net), which is realistic given the abundance of available datasets in real-world scenarios. ...The test set for each combination of image dataset and label mapping consists of a total of 320 models: 20 trojaned models per attack and 160 clean models (check Appendix Section N for more details).
Hardware Specification Yes Our experiments on our method and other baselines were conducted on a single RTX 3090 GPU.
Software Dependencies No The paper mentions using the Backdoor Bench framework [98] and another GitHub repository for color attacks, but does not specify versions for general software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes We set k = 3 as a rule of thumb. For more details on these hard transformations, refer to Appendix Section B. ... We used PGD-10 as the adversarial attack. ... We consider 0.5 as a hyperparameter, denoted by γ, which we refer to as the boundary confidence level.