DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification

Authors: Mintong Kang, Dawn Song, Bo Li

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate the attack effectiveness of Diff Attack compared with existing adaptive attacks on CIFAR-10 and Image Net. We conduct a series of ablations studies, and we find... In this section, we evaluate Diff Attack from various perspectives empirically. As a summary, we find that 1) Diff Attack significantly outperforms other SOTA attack methods against diffusion-based defenses on both the score-based purification and DDPM-based purification models, especially under large perturbation radii (Section 4.2 and Section 4.3); 2) Diff Attack outperforms other strong attack methods such as the black-box attack and adaptive attacks against other adversarial purification defenses (Section 4.4); 3) a moderate diffusion length T benefits the model robustness, since too long/short diffusion length would hurt the robustness (Section 4.5); 4) our proposed segment-wise forwarding-backwarding algorithm achieves O(1)-memory cost and outperforms other baselines by a large margin (Section 4.6); and 5) attacks with the deviated-reconstruction loss added over uniformly sampled time steps outperform that added over only initial/final time steps (Section 4.7).
Researcher Affiliation Academia Mintong Kang UIUC mintong2@illinois.edu Dawn Song UC Berkeley dawnsong@berkeley.edu Bo Li UIUC lbo@illinois.edu
Pseudocode Yes We provide the pseudo-codes of Diff Attack in Algorithm 2 in Appendix D.1. Algorithm 1 Segment-wise forwarding-backwarding algorithm (Py Torch-like pseudo-codes) Algorithm 2 Diff Attack
Open Source Code Yes The codes are publicly available at https: //github.com/kangmintong/Diff Attack.
Open Datasets Yes Dataset & model. We validate Diff Attack on CIFAR-10 [27] and Image Net [13].
Dataset Splits No The paper states it uses CIFAR-10 and Image Net and samples a subset of 512 images from the test set for evaluation, but it does not explicitly describe a train/validation/test split for reproducibility purposes, especially for a validation set.
Hardware Specification Yes The evaluation is done on an RTX A6000 GPU with 49,140 MB memory.
Software Dependencies No The paper mentions implementing Diff Attack in the framework of Auto Attack and using PyTorch, but it does not provide specific version numbers for these software dependencies (e.g., 'PyTorch 1.9', 'Auto Attack vX.Y').
Experiment Setup Yes Specifically, the number of iterations of attacks (Niter) is 100, and the number of iterations to approximate the gradients (EOT) is 20. The momentum coefficient α is 0.75, and the step size η is initialized with 2ϵ where ϵ is the maximum ℓp-norm of the perturbations. The balance factor λ between the classificationguided loss and the deviated-reconstruction loss in Equation (8) is fixed as 1.0 and α( ) is set the reciprocal of the size of sampled time steps in the evaluation. We consider ϵ = 8/255 and ϵ = 4/255 for ℓ attack and ϵ = 0.5 for ℓ2 attack following the literature [11, 12].