Adversarial Purification with the Manifold Hypothesis

Authors: Zhaoyuan Yang, Zhiwei Xu, Jing Zhang, Richard Hartley, Peter Tu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimentally, our approach can provide adversarial robustness even if attackers are aware of the existence of the defense. In addition, our method can also serve as a test-time defense mechanism for variational autoencoders. Code is available at: https://github.com/Go L2022/Adv PFY. Experiments We first evaluate our method on MNIST (Le Cun, Cortes, and Burges 2010), Fashion-MNIST (Xiao, Rasul, and Vollgraf 2017), SVHN (Netzer et al. 2011), and CIFAR-10 (Krizhevsky and Hinton 2009), followed by CIFAR-100 (Krizhevsky and Hinton 2009) and Celeb A (64 64 and 128 128) for gender classification (Liu et al. 2015). See Yang et al. (2023) for the dataset details.
Researcher Affiliation Collaboration Zhaoyuan Yang1, Zhiwei Xu2, Jing Zhang2, Richard Hartley2, Peter Tu1 1GE Research, Niskayuna, NY 2Australian National University, Canberra, Australia {zhaoyuan.yang,tu}@ge.com, {zhiwei.xu,jing.zhang,richard.hartley}@anu.edu.au
Pseudocode Yes Algorithm 1: Test-time Purification Input: xadv: input (adv) data; α: learning rate; T: number of purification iterations; ϵth: purification budget. Output: xpfy: purified data; s: purification score. 1: procedure PURIFY(x, α, T, ϵth) 2: ϵpfy U[ ϵth,ϵth] random initialization 3: for t = 1, 2, ..., T do 4: ϵpfy ϵpfy + α sign( ϵpfy F(xadv + ϵpfy)) 5: ϵpfy min(max(ϵpfy, ϵth), ϵth) 6: ϵpfy min(max(xadv + ϵpfy, 0), 1) xadv 7: xpfy xadv + ϵpfy purified data 8: s F(xpfy) purification score 9: return xpfy, s
Open Source Code Yes Code is available at: https://github.com/Go L2022/Adv PFY.
Open Datasets Yes We first evaluate our method on MNIST (Le Cun, Cortes, and Burges 2010), Fashion-MNIST (Xiao, Rasul, and Vollgraf 2017), SVHN (Netzer et al. 2011), and CIFAR-10 (Krizhevsky and Hinton 2009), followed by CIFAR-100 (Krizhevsky and Hinton 2009) and Celeb A (64 64 and 128 128) for gender classification (Liu et al. 2015). See Yang et al. (2023) for the dataset details.
Dataset Splits No The paper mentions using training and testing data from standard datasets like MNIST, CIFAR-10, etc., but does not explicitly provide the specific train/validation/test dataset splits (e.g., percentages, sample counts, or references to predefined splits for all datasets) within the main text.
Hardware Specification Yes We evaluate our method on an NVIDIA Tesla P100 GPU in Py Torch.
Software Dependencies No The paper mentions software like 'Py Torch', 'Foolbox', and 'Torchattacks' but does not specify their version numbers.
Experiment Setup Yes We empirically set the weight of the classification loss (λ in Eq. (3)) to 8. See Yang et al. (2023) for details. Adversarial Attacks. We evaluate our method on standard adversarial attacks and adaptive attacks (multi-objective and BPDA). All attacks are untargeted. For standard adversarial attacks (Eq. (4)), we use Foolbox (Rauber, Brendel, and Bethge 2017) to generate the PGD (ℓ ) attacks (Madry et al. 2018). We use Croce and Hein (2020) for the Auto Attack (ℓ , ℓ2). For the adaptive attacks, we use Torchattacks (Kim 2020) for the BPDA-PGD/APGD (ℓ ) attacks (Athalye, Carlini, and Wagner 2018), and standard PGD (ℓ ) for the multi-objective attacks. For MNIST and Fashion-MNIST, we report the attack hyperparameters and numerical results in Yang et al. (2023). For SVHN, CIFAR10/100, and Celeb A, we set the ℓ attack budget δth to 8/255 and the ℓ2 attack budget to 0.5. We run 100 iterations with step size 2/255 for PGD (ℓ ) and 50 iterations with step size 2/255 for the BPDA attack. We also evaluate our Res Net-50 (CIFAR-10) model on the Ray S (blackbox) attack (Chen and Gu 2020), the FGSM (ℓ ) attack (Goodfellow, Shlens, and Szegedy 2015) and the C&W (ℓ2) attack (Carlini and Wagner 2017) in Yang et al. (2023) and our defense is effective for these attacks. Test-time Purification. Key hyperparameters and experimental details are provided below, and only the ℓ -bounded purification is considered in this work. We initialize the purified signal ϵpfy by sampling from an uniform distribution U[ ϵth,ϵth] where ϵth is the purification budget. We run purification 16 times in parallel with different initializations and select the signal with the best purification score measured by the reconstruction loss or the ELBO. Step size α is alternated between {1/255, 2/255} for each run. For SVHN, CIFAR-10/100, and Celeb A, we set the ℓ purification budget ϵth to 8/255 with 32 iterations.