Adversarial Purification with Score-based Generative Models
Authors: Jongmin Yoon, Sung Ju Hwang, Juho Lee
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we validate our defense method ADP from various perspectives. First, we evaluate ADP under strongest existing attacks on ℓ -bounded threat models and compare it to other state-of-the-art defense methods including adversarial training and adversarial purification. Then we show the certified robustness of our model on ℓ2-bounded threat models and compare it to other existing randomized classifiers. We further verify the perceptual robustness of our method with common corruptions (Hendrycks & Dietterich, 2019) on CIFAR-10. We further validate ours on a variety of datasets including MNIST, Fashion MNIST, and CIFAR-100. |
| Researcher Affiliation | Collaboration | 1Korea Advanced Institute of Science and Technology, Daejeon, Korea 2AITRICS, Seoul, Korea. |
| Pseudocode | Yes | Algorithm 1 Adversarial purification with ADP |
| Open Source Code | No | The paper does not contain any explicit statement about providing open-source code for the described methodology or a direct link to a code repository. |
| Open Datasets | Yes | We further verify the perceptual robustness of our method with common corruptions (Hendrycks & Dietterich, 2019) on CIFAR-10. We further validate ours on a variety of datasets including MNIST, Fashion MNIST, and CIFAR-100. |
| Dataset Splits | No | While these can be tuned with additional vaildation set 1, we propose a simple yet effective adaptation scheme that can choose proper step-sizes during the purification. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU/CPU models or memory. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | For all experiments, we use Wide Res Net (Zagoruyko & Komodakis, 2016) with depth 28 and width factor 10, having 36.5M parameters. For the score model, we use NCSN having 29.7M parameters. Unless otherwise stated, for ADP, we fixed the adaptive step size parameters (λ, δ) = (0.05, 10 5), and computed ensembles over S = 10 purification runs, i.e., we take 10 random noise injection over Gaussian distribution ε N(0, σ2I), followed by clipping to [0, 1]. We fixed the purification stopping threshold τ is given by 0.001. As aforementioned, the noise standard deviation σ was fixed to 0.25 for all experiments. |