Adversarial Machine Learning at Scale

Authors: Alexey Kurakin, Ian J. Goodfellow, Samy Bengio

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this research, we apply adversarial training to Image Net (Russakovsky et al., 2014). Our contributions include: (1) recommendations for how to succesfully scale adversarial training to large models and datasets, (2) the observation that adversarial training confers robustness to single-step attack methods, (3) the finding that multi-step attack methods are somewhat less transferable than singlestep attack methods, and (4) resolution of a label leaking effect that causes adversarially trained models to perform better on adversarial examples than on clean examples, because the adversarial example construction process uses the true label and the model can learn to exploit regularities in the construction process. and 4 EXPERIMENTS We adversarially trained an Inception v3 model (Szegedy et al., 2015) on Image Net.
Researcher Affiliation Industry Alexey Kurakin Google Brain kurakin@google.com Ian J. Goodfellow Open AI ian@openai.com Samy Bengio Google Brain bengio@google.com
Pseudocode Yes Algorithm 1 Adversarial training of network N. Size of the training minibatch is m. Number of adversarial images in the minibatch is k. 1: Randomly initialize network N 2: repeat 3: Read minibatch B = {X1, . . . , Xm} from training set 4: Generate k adversarial examples {X1 adv, . . . , Xk adv} from corresponding clean examples {X1, . . . , Xk} using current state of the network N 5: Make new minibatch B = {X1 adv, . . . , Xk adv, Xk+1, . . . , Xm} 6: Do one training step of network N using minibatch B 7: until training converged
Open Source Code No The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper.
Open Datasets Yes We adversarially trained an Inception v3 model (Szegedy et al., 2015) on Image Net dataset (Russakovsky et al., 2014) and to significantly increase robustness against adversarial examples generated by the fast gradient sign method (Goodfellow et al., 2014) as well as other one-step methods.
Dataset Splits Yes For evaluation we used the Image Net validation set which contains 50, 000 images and does not intersect with the training set.
Hardware Specification No The paper states, 'All experiments were done using synchronous distributed training on 50 machines, with a minibatch of 32 examples on each machine,' but does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts.
Software Dependencies No The paper mentions TensorFlow: 'In Tensor Flow this could be achieved by tf.abs(tf.truncated normal(shape, mean=0, stddev=8)).' However, it does not provide specific version numbers for TensorFlow or any other software dependencies.
Experiment Setup Yes We used λ = 0.3, m = 32, and k = 16. ... We observed that the network tends to reach maximum accuracy at around 130k 150k iterations. ... Similar to Szegedy et al. (2015) we used RMSProp optimizer for training. We used a learning rate of 0.045 except where otherwise indicated. ... We experimented with delaying adversarial training by 0, 10k, 20k and 40k iterations.