Minimax Optimization with Smooth Algorithmic Adversaries

Authors: Tanner Fiez, Chi Jin, Praneeth Netrapalli, Lillian J Ratliff

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This section presents empirical results evaluating our SGD algorithm (Algorithm 1) for generative adversarial networks (Goodfellow et al., 2014) and adversarial training (Madry et al., 2018). Our results demonstrate that our framework results in stable monotonic improvement during training and converges to desirable solutions in both GAN and adversarial training problems.
Researcher Affiliation Collaboration Tanner Fiez , Lillian J. Ratliff University of Washington, Seattle {fiezt, ratliffl}@uw.edu Chi Jin Princeton University chij@princeton.edu Praneeth Netrapalli Google Research, India pnetrapalli@google.com
Pseudocode Yes Algorithm 1: Stochastic subgradient descent (SGD)
Open Source Code Yes The code for the experiments is included in the supplementary material with instructions on how to run.
Open Datasets Yes We run an adversarial training experiment with the MNIST dataset
Dataset Splits No The paper mentions using a 'training set' and evaluating on 'test classification accuracy' but does not provide specific percentages or sample counts for training, validation, or test splits. It refers to 'standard adversarial training' but does not specify the splits used.
Hardware Specification Yes For the experiments with neural network models we used two Nvidia Ge Force GTX 1080 Ti GPU and the Py Torch higher library(Deleu et al., 2019) to compute f(θ, A(θ)).
Software Dependencies No The paper mentions 'Py Torch higher library(Deleu et al., 2019)' but does not provide specific version numbers for PyTorch or the 'higher' library.
Experiment Setup Yes The learning rates for both the generator and the discriminator are η = 0.01. The minimization procedure has a fixed learning rate of η1 = 0.0001 and the maximization procedure runs for T = 10 steps with a fixed learning rate of η2 = 4. We compare Algorithm 1 with usual adversarial training (Madry et al., 2018) which descends θf(θ, A(θ)) instead of f(θ, A(θ)), and a baseline of standard training without adversarial training. For each algorithm, we train for 100 passes over the training set using a batch size of 50.