Elijah: Eliminating Backdoors Injected in Diffusion Models via Distribution Shift
Authors: Shengwei An, Sheng-Yen Chou, Kaiyuan Zhang, Qiuling Xu, Guanhong Tao, Guangyu Shen, Siyuan Cheng, Shiqing Ma, Pin-Yu Chen, Tsung-Yi Ho, Xiangyu Zhang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our framework ELIJAH on over hundreds of DMs of 3 types including DDPM, NCSN and LDM, with 13 samplers against 3 existing backdoor attacks. Extensive experiments show that our approach can have close to 100% detection accuracy and reduce the backdoor effects to close to zero without significantly sacrificing the model utility. |
| Researcher Affiliation | Collaboration | Shengwei An1, Sheng-Yen Chou2, Kaiyuan Zhang1, Qiuling Xu1, Guanhong Tao1, Guangyu Shen1, Siyuan Cheng1, Shiqing Ma3, Pin-Yu Chen4, Tsung-Yi Ho2, Xiangyu Zhang1 1Purdue University 2The Chinese University of Hong Kong 3University of Massachusetts Amherst 4IBM Research |
| Pseudocode | Yes | Algorithm 2 in the appendix shows the pseudocode. Algorithm 3 in the appendix illustrates the feature extraction. Algorithm 4 shows backdoor detection in this setting. The procedure is described in Algorithm 5. Our backdoor removal method is described by Algorithm 6 in the appendix (An et al. 2023). |
| Open Source Code | Yes | Our code: https://github.com/njuaplusplus/Elijah |
| Open Datasets | Yes | We mainly use the CIFAR-10 (Krizhevsky, Hinton et al. 2009) and downscaled Celeb A-HQ (Karras et al. 2018) datasets as they are the two datasets considered in the evaluated backdoor attack methods. |
| Dataset Splits | No | The paper does not explicitly specify the train/validation/test splits (e.g., percentages or sample counts) for the CIFAR-10 or Celeb A-HQ datasets as used in their experiments, nor does it specify the splits for the set of clean and backdoored models used for training and testing their detector. |
| Hardware Specification | Yes | Our main experiments run on a server equipped with Intel Xeon Silver 4214 2.40GHz 12-core CPUs with 188 GB RAM and NVIDIA Quadro RTX A6000 GPUs. |
| Software Dependencies | No | The paper mentions implementing the framework in PyTorch: 'We implement our framework ELIJAH including trigger inversion, backdoor detection, and backdoor removal algorithms in Py Torch (Paszke et al. 2019)'. However, it does not specify version numbers for PyTorch or any other software libraries, which is required for a reproducible description. |
| Experiment Setup | Yes | In our experiments, we set λ = 0.5. |