Understanding and Improving Fast Adversarial Training
Authors: Maksym Andriushchenko, Nicolas Flammarion
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide the main comparison in Fig. 8 and provide detailed numbers for specific values of ε in Appendix D.3 which also includes an additional evaluation of our models with Auto Attack [8]. First, we notice that all the methods perform almost equally well for small enough ε, i.e. ε 6/255 on CIFAR-10 and ε 4/255 on SVHN. However, the performance for larger ε varies a lot depending on the method due to catastrophic overfitting. |
| Researcher Affiliation | Academia | Maksym Andriushchenko EPFL, Theory of Machine Learning Lab maksym.andriushchenko@epfl.chNicolas Flammarion EPFL, Theory of Machine Learning Lab nicolas.flammarion@epfl.ch |
| Pseudocode | No | The paper describes methods through textual explanation and mathematical equations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code of our experiments is available at https://github.com/tml-epfl/ understanding-fast-adv-training. |
| Open Datasets | Yes | Experimental setup. Unless mentioned otherwise, we perform training on Pre Act Res Net-18 [16] with the cyclic learning rates [37] and half-precision training [24] following the setup of [47]. We evaluate adversarial robustness using the PGD-50-10 attack, i.e. with 50 iterations and 10 restarts with step size α = ε/4 following [47]. More experimental details are specified in Appendix B.1In practice we use training samples with random data augmentation.2Throughout the paper we will focus on image classification, i.e. inputs x will be images. We train these methods using Pre Act Res Net-18 [16] with ℓ -radii ε {1/255, . . . , 16/255} on CIFAR-10 for 30 epochs and ε {1/255, . . . , 12/255} on SVHN for 15 epochs. |
| Dataset Splits | No | The paper does not explicitly state the use of a validation dataset split, nor does it provide specific percentages or counts for training, validation, and test splits. |
| Hardware Specification | Yes | Training with Grad Align leads on average to a 3 slowdown on an NVIDA V100 GPU compared to FGSM training which is due to the use of double backpropagation (see [9] for a detailed analysis). |
| Software Dependencies | No | The paper mentions techniques and model architectures like "Pre Act Res Net-18", "cyclic learning rates", and "half-precision training", but does not specify any software libraries or tools with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x). |
| Experiment Setup | Yes | Experimental setup. Unless mentioned otherwise, we perform training on Pre Act Res Net-18 [16] with the cyclic learning rates [37] and half-precision training [24] following the setup of [47]. We evaluate adversarial robustness using the PGD-50-10 attack, i.e. with 50 iterations and 10 restarts with step size α = ε/4 following [47]. More experimental details are specified in Appendix B. We train these methods using Pre Act Res Net-18 [16] with ℓ -radii ε {1/255, . . . , 16/255} on CIFAR-10 for 30 epochs and ε {1/255, . . . , 12/255} on SVHN for 15 epochs. The only exception is AT for Free [34] which we train for 96 epochs on CIFAR-10, and 45 epochs on SVHN which was necessary to get comparable results to the other methods. |