MMA Training: Direct Input Space Margin Maximization through Adversarial Training
Authors: Gavin Weiguang Ding, Yash Sharma, Kry Yik Chau Lui, Ruitong Huang
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments empirically confirm our theory and demonstrate MMA training s efficacy on the MNIST and CIFAR10 datasets w.r.t. ℓ and ℓ2 robustness. Code and models are available at https://github.com/Borealis AI/mma_training. |
| Researcher Affiliation | Collaboration | 1Borealis AI 2University of Tuebingen 3Max Planck Institute for Intelligent Systems |
| Pseudocode | Yes | Algorithm 1 describes the Adaptive Norm PGD Attack (AN-PGD) algorithm. Algorithm 2 summarizes our practical MMA training algorithm. |
| Open Source Code | Yes | Code and models are available at https://github.com/Borealis AI/mma_training. |
| Open Datasets | Yes | Our experiments empirically confirm our theory and demonstrate MMA training s efficacy on the MNIST and CIFAR10 datasets w.r.t. ℓ and ℓ2 robustness. |
| Dataset Splits | Yes | For all the experiments, we monitor the average margin from AN-PGD on the validation set and choose the model with largest average margin from the sequence of checkpoints during training. The validation set contains first 5000 images of training set. |
| Hardware Specification | No | The paper does not provide specific hardware details such as CPU/GPU models, memory, or cloud instance types. It mentions 'computation resources' but without any specifications. |
| Software Dependencies | No | The paper mentions 'Adver Torch toolbox (Ding et al., 2019b)' and 'Py Torch' but does not specify version numbers for these or any other software components used in the experiments. |
| Experiment Setup | Yes | For training Le Net5 on all MNIST experiments, for both PGD and MMA training, we use the Adam optimizer with an initial learning rate of 0.0001 and train for 100000 steps with batch size 50. For training Wide Res Net on CIFAR10 variants, we use stochastic gradient descent with momentum 0.9 and weight decay 0.0002. We train 50000 steps in total with batch size 128. The learning rate is set to 0.3 at step 0, 0.09 at step 20000, 0.03 at step 30000, and 0.009 at step 40000. For models trained on MNIST, we use 40-step PGD attack with the soft logit margin (SLM) loss defined in Section 3, for CIFAR10 we use 10 step-PGD, also with the SLM loss. For both MNIST and CIFAR10, the step size of PGD attack at training time is 2.5ϵ number of steps. |