Adversarial Weight Perturbation Helps Robust Generalization
Authors: Dongxian Wu, Shu-Tao Xia, Yisen Wang
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that AWP indeed brings flatter weight loss landscape and can be easily incorporated into various existing adversarial training methods to further boost their adversarial robustness. Through extensive experiments, we demonstrate that AWP consistently improves the adversarial robustness of state-of-the-art methods by a notable margin. |
| Researcher Affiliation | Academia | Dongxian Wu1,3 Shu-Tao Xia1,3 Yisen Wang2 1Tsinghua University 2Key Lab. of Machine Perception (Mo E), School of EECS, Peking University 3PCL Research Center of Networks and Communications, Peng Cheng Laboratory |
| Pseudocode | Yes | The complete pseudo-code of AT-AWP and extensions of AWP to other adversarial training approaches like TRADES, MART and RST are shown in Appendix D. APPENDIX D. PSEUDO-CODE |
| Open Source Code | Yes | https://github.com/csdongxian/AWP/tree/main/auto_attacks |
| Open Datasets | Yes | We train a Pre Act Res Net-18 [15] on CIFAR-10 [21] for 200 epochs |
| Dataset Splits | No | The paper mentions training and test sets but does not provide specific percentages, sample counts, or explicit descriptions for a validation split. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment. |
| Experiment Setup | Yes | We train a Pre Act Res Net-18 [15] on CIFAR-10 [21] for 200 epochs using vanilla AT with a piece-wise learning rate schedule (initial learning rate is 0.1, and divided by 10 at the 100-th and 150-th epoch). The training and test attacks are both 10-step PGD (PGD-10) with step size 2/255 and maximum L∞ perturbation ϵ = 8/255. For CIFAR-10 under L∞ attack with ϵ = 8/255, we train Wide Res Net-34-10 for AT, TRADES, and MART, while Wide Res Net-28-10 for Pre-training and RST, following their original papers. For pre-training, we fine-tune 50 epochs using a learning rate of 0.001 as [17]. Other defenses are trained for 200 epochs using SGD with momentum 0.9, weight decay 5×10−4, and an initial learning rate of 0.1 that is divided by 10 at the 100-th and 150-th epoch. Simple data augmentations such as 32×32 random crop with 4-pixel padding and random horizontal flip are applied. The training attack is PGD-10 with step size 2/255. For AWP, we set γ = 5×10−3. |