reproducibilityindex.ai

MMA Training: Direct Input Space Margin Maximization through Adversarial Training

Authors: Gavin Weiguang Ding, Yash Sharma, Kry Yik Chau Lui, Ruitong Huang

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments empirically conﬁrm our theory and demonstrate MMA training s efﬁcacy on the MNIST and CIFAR10 datasets w.r.t. ℓ and ℓ2 robustness. Code and models are available at https://github.com/Borealis AI/mma_training.
Researcher Affiliation	Collaboration	1Borealis AI 2University of Tuebingen 3Max Planck Institute for Intelligent Systems
Pseudocode	Yes	Algorithm 1 describes the Adaptive Norm PGD Attack (AN-PGD) algorithm. Algorithm 2 summarizes our practical MMA training algorithm.
Open Source Code	Yes	Code and models are available at https://github.com/Borealis AI/mma_training.
Open Datasets	Yes	Our experiments empirically conﬁrm our theory and demonstrate MMA training s efﬁcacy on the MNIST and CIFAR10 datasets w.r.t. ℓ and ℓ2 robustness.
Dataset Splits	Yes	For all the experiments, we monitor the average margin from AN-PGD on the validation set and choose the model with largest average margin from the sequence of checkpoints during training. The validation set contains ﬁrst 5000 images of training set.
Hardware Specification	No	The paper does not provide specific hardware details such as CPU/GPU models, memory, or cloud instance types. It mentions 'computation resources' but without any specifications.
Software Dependencies	No	The paper mentions 'Adver Torch toolbox (Ding et al., 2019b)' and 'Py Torch' but does not specify version numbers for these or any other software components used in the experiments.
Experiment Setup	Yes	For training Le Net5 on all MNIST experiments, for both PGD and MMA training, we use the Adam optimizer with an initial learning rate of 0.0001 and train for 100000 steps with batch size 50. For training Wide Res Net on CIFAR10 variants, we use stochastic gradient descent with momentum 0.9 and weight decay 0.0002. We train 50000 steps in total with batch size 128. The learning rate is set to 0.3 at step 0, 0.09 at step 20000, 0.03 at step 30000, and 0.009 at step 40000. For models trained on MNIST, we use 40-step PGD attack with the soft logit margin (SLM) loss deﬁned in Section 3, for CIFAR10 we use 10 step-PGD, also with the SLM loss. For both MNIST and CIFAR10, the step size of PGD attack at training time is 2.5ϵ number of steps.