A Variational Inequality Perspective on Generative Adversarial Networks
Authors: Gauthier Gidel, Hugo Berard, Gaëtan Vignoud, Pascal Vincent, Simon Lacoste-Julien
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our goal in this experimental section is not to provide new state-of-the art results with architectural improvements or a new GAN formulation, but to show that using the techniques (with theoretical guarantees in the monotone case) that we introduced earlier allows us to optimize standard GANs in a better way. |
| Researcher Affiliation | Collaboration | 1Mila & DIRO, University of Montreal 2Canada CIFAR AI Chair 3Facebook Artificial Intelligence Research |
| Pseudocode | Yes | Algorithm 1 Avg SGD; Algorithm 2 Avg Extra SGD; Algorithm 3 Avg Past Extra SGD; Algorithm 4 Extra-Adam; Algorithm 5 Re-used mini-batches for stochastic extrapolation (Re Extra SGD) |
| Open Source Code | Yes | Code available at https://gauthiergidel.github.io/projects/vip-gan.html. |
| Open Datasets | Yes | We evaluate the proposed techniques in the context of GAN training, which is a challenging stochastic optimization problem where the objectives of both players are non-convex. We propose to evaluate the Adam variants of the different optimization algorithms (see Alg. 4 for Adam with extrapolation) by training two different architectures on the CIFAR10 dataset (Krizhevsky and Hinton, 2009). |
| Dataset Splits | Yes | We evaluate the proposed techniques in the context of GAN training, which is a challenging stochastic optimization problem where the objectives of both players are non-convex. We propose to evaluate the Adam variants of the different optimization algorithms (see Alg. 4 for Adam with extrapolation) by training two different architectures on the CIFAR10 dataset (Krizhevsky and Hinton, 2009). |
| Hardware Specification | Yes | all experiments were run on a NVIDIA Quadro GP100 GPU |
| Software Dependencies | No | The paper mentions using Adam and SGD optimizers but does not provide specific version numbers for any software libraries or frameworks (e.g., PyTorch, TensorFlow, Python, CUDA). |
| Experiment Setup | Yes | For each algorithm, we did an extensive search over the hyperparameters of Adam. We fixed β1 = 0.5 and β2 = 0.9 for all methods as they seemed to perform well. We note that as proposed by Heusel et al. (2017), it is quite important to set different learning rates for the generator and discriminator. Experiments were run with 5 random seeds for 500,000 updates of the generator. [...] (DCGAN) WGAN Hyperparameters Batch size = 64 Number of generator update = 500, 000 Adam β1 = 0.5 Adam β2 = 0.9 Weight clipping for the discriminator = 0.01 Learning rate for generator = 2 10 5 (for Adam1, Adam5, Past Extra Adam, Optimistic Adam) = 5 10 5 (for Extra Adam) Learning rate for discriminator = 2 10 4 (for Adam1, Adam5, Past Extra Adam, Optimistic Adam) = 5 10 4 (for Extra Adam) β for EMA = 0.999 |