A Variational Inequality Perspective on Generative Adversarial Networks

Authors: Gauthier Gidel, Hugo Berard, Gaëtan Vignoud, Pascal Vincent, Simon Lacoste-Julien

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our goal in this experimental section is not to provide new state-of-the art results with architectural improvements or a new GAN formulation, but to show that using the techniques (with theoretical guarantees in the monotone case) that we introduced earlier allows us to optimize standard GANs in a better way.
Researcher Affiliation Collaboration 1Mila & DIRO, University of Montreal 2Canada CIFAR AI Chair 3Facebook Artificial Intelligence Research
Pseudocode Yes Algorithm 1 Avg SGD; Algorithm 2 Avg Extra SGD; Algorithm 3 Avg Past Extra SGD; Algorithm 4 Extra-Adam; Algorithm 5 Re-used mini-batches for stochastic extrapolation (Re Extra SGD)
Open Source Code Yes Code available at https://gauthiergidel.github.io/projects/vip-gan.html.
Open Datasets Yes We evaluate the proposed techniques in the context of GAN training, which is a challenging stochastic optimization problem where the objectives of both players are non-convex. We propose to evaluate the Adam variants of the different optimization algorithms (see Alg. 4 for Adam with extrapolation) by training two different architectures on the CIFAR10 dataset (Krizhevsky and Hinton, 2009).
Dataset Splits Yes We evaluate the proposed techniques in the context of GAN training, which is a challenging stochastic optimization problem where the objectives of both players are non-convex. We propose to evaluate the Adam variants of the different optimization algorithms (see Alg. 4 for Adam with extrapolation) by training two different architectures on the CIFAR10 dataset (Krizhevsky and Hinton, 2009).
Hardware Specification Yes all experiments were run on a NVIDIA Quadro GP100 GPU
Software Dependencies No The paper mentions using Adam and SGD optimizers but does not provide specific version numbers for any software libraries or frameworks (e.g., PyTorch, TensorFlow, Python, CUDA).
Experiment Setup Yes For each algorithm, we did an extensive search over the hyperparameters of Adam. We fixed β1 = 0.5 and β2 = 0.9 for all methods as they seemed to perform well. We note that as proposed by Heusel et al. (2017), it is quite important to set different learning rates for the generator and discriminator. Experiments were run with 5 random seeds for 500,000 updates of the generator. [...] (DCGAN) WGAN Hyperparameters Batch size = 64 Number of generator update = 500, 000 Adam β1 = 0.5 Adam β2 = 0.9 Weight clipping for the discriminator = 0.01 Learning rate for generator = 2 10 5 (for Adam1, Adam5, Past Extra Adam, Optimistic Adam) = 5 10 5 (for Extra Adam) Learning rate for discriminator = 2 10 4 (for Adam1, Adam5, Past Extra Adam, Optimistic Adam) = 5 10 4 (for Extra Adam) β for EMA = 0.999