Robust Learning Meets Generative Models: Can Proxy Distributions Improve Adversarial Robustness?

Authors: Vikash Sehwag, Saeed Mahloujifar, Tinashe Handina, Sihui Dai, Chong Xiang, Mung Chiang, Prateek Mittal

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Next we use proxy distributions to significantly improve the performance of adversarial training on five different datasets. For example, we improve robust accuracy by up to 7.5% and 6.7% in ℓ and ℓ2 threat model over baselines that are not using proxy distributions on the CIFAR-10 dataset. We also improve certified robust accuracy by 7.6% on the CIFAR-10 dataset.
Researcher Affiliation Academia Princeton University, Caltech, Purdue University
Pseudocode No The paper describes its methods narratively but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/inspire-group/proxy-distributions. For further reproducibility, we have also submitted our code with the supplementary material.
Open Datasets Yes We consider five datasets, namely CIFAR-10 (Krizhevsky et al., 2014), CIFAR100 (Krizhevsky et al., 2014), Celeb A (Liu et al., 2015), AFHQ (Choi et al., 2020), and Image Net (Deng et al., 2009).
Dataset Splits Yes We consider five datasets, namely CIFAR-10 (Krizhevsky et al., 2014), CIFAR100 (Krizhevsky et al., 2014), Celeb A (Liu et al., 2015), AFHQ (Choi et al., 2020), and Image Net (Deng et al., 2009). We keep 10, 000 synthetic images from this set for validation and train on the rest of them.
Hardware Specification Yes Using an RTX 4x2080Ti GPU cluster, it takes 23.8 hours to sample one million images on the CIFAR-10 dataset.
Software Dependencies No The paper describes the use of various models and tools (e.g., 'Res Net-18', 'Auto Attack'), but it does not specify any software libraries or frameworks with version numbers (e.g., 'PyTorch 1.9' or 'Python 3.8').
Experiment Setup Yes We train each network using stochastic gradient descent and 0.1 learning rate with cosine learning rate decay, weight decay of 5 10 4, batch size 128, and 200 epochs. We use γ = 0.4 as it achieves best results (Appendix D). We combine real and synthetic images in a 1:1 ratio in each batch