MMA Training: Direct Input Space Margin Maximization through Adversarial Training

Authors: Gavin Weiguang Ding, Yash Sharma, Kry Yik Chau Lui, Ruitong Huang

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments empirically confirm our theory and demonstrate MMA training s efficacy on the MNIST and CIFAR10 datasets w.r.t. ℓ and ℓ2 robustness. Code and models are available at https://github.com/Borealis AI/mma_training.
Researcher Affiliation Collaboration 1Borealis AI 2University of Tuebingen 3Max Planck Institute for Intelligent Systems
Pseudocode Yes Algorithm 1 describes the Adaptive Norm PGD Attack (AN-PGD) algorithm. Algorithm 2 summarizes our practical MMA training algorithm.
Open Source Code Yes Code and models are available at https://github.com/Borealis AI/mma_training.
Open Datasets Yes Our experiments empirically confirm our theory and demonstrate MMA training s efficacy on the MNIST and CIFAR10 datasets w.r.t. ℓ and ℓ2 robustness.
Dataset Splits Yes For all the experiments, we monitor the average margin from AN-PGD on the validation set and choose the model with largest average margin from the sequence of checkpoints during training. The validation set contains first 5000 images of training set.
Hardware Specification No The paper does not provide specific hardware details such as CPU/GPU models, memory, or cloud instance types. It mentions 'computation resources' but without any specifications.
Software Dependencies No The paper mentions 'Adver Torch toolbox (Ding et al., 2019b)' and 'Py Torch' but does not specify version numbers for these or any other software components used in the experiments.
Experiment Setup Yes For training Le Net5 on all MNIST experiments, for both PGD and MMA training, we use the Adam optimizer with an initial learning rate of 0.0001 and train for 100000 steps with batch size 50. For training Wide Res Net on CIFAR10 variants, we use stochastic gradient descent with momentum 0.9 and weight decay 0.0002. We train 50000 steps in total with batch size 128. The learning rate is set to 0.3 at step 0, 0.09 at step 20000, 0.03 at step 30000, and 0.009 at step 40000. For models trained on MNIST, we use 40-step PGD attack with the soft logit margin (SLM) loss defined in Section 3, for CIFAR10 we use 10 step-PGD, also with the SLM loss. For both MNIST and CIFAR10, the step size of PGD attack at training time is 2.5ϵ number of steps.