Improved Training of Wasserstein GANs

Authors: Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, Aaron C. Courville

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose an alternative to clipping weights: penalize the norm of gradient of the critic with respect to its input. Our proposed method performs better than standard WGAN and enables stable training of a wide variety of GAN architectures with almost no hyperparameter tuning, including 101-layer Res Nets and language models with continuous generators. We also achieve high quality generations on CIFAR-10 and LSUN bedrooms. Section 5 Experiments
Researcher Affiliation Academia Ishaan Gulrajani1 , Faruk Ahmed1, Martin Arjovsky2, Vincent Dumoulin1, Aaron Courville1,3 1 Montreal Institute for Learning Algorithms 2 Courant Institute of Mathematical Sciences 3 CIFAR Fellow igul222@gmail.com {faruk.ahmed,vincent.dumoulin,aaron.courville}@umontreal.ca ma4371@nyu.edu
Pseudocode Yes Algorithm 1 WGAN with gradient penalty.
Open Source Code Yes Code for our models is available at https://github.com/igul222/improved wgan training.
Open Datasets Yes From this set, we sample 200 architectures and train each on 32 32 Image Net with both WGAN-GP and the standard GAN objectives. ... train six different GAN architectures on the LSUN bedrooms dataset [30]. ... train WGANs with weight clipping and our gradient penalty on CIFAR-10 [13] ... we train a character-level GAN language model on the Google Billion Word dataset [6].
Dataset Splits Yes To explore the loss curve s behavior when the network overfits, we train large unregularized WGANs on a random 1000-image subset of MNIST and plot the negative critic loss on both the training and validation sets in Figure 5b.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions using specific optimizers like Adam and RMSProp, and normalization schemes like Layer Normalization. However, it does not provide specific version numbers for any software, libraries, or frameworks used.
Experiment Setup Yes Algorithm 1 WGAN with gradient penalty. We use default values of λ = 10, ncritic = 5, = 0.0001, β1 = 0, β2 = 0.9. ... Table 1: We evaluate WGAN-GP s ability to train the architectures in this set. Nonlinearity (G) [Re LU, Leaky Re LU, softplus(2x+2) 2 1, tanh] Nonlinearity (D) [Re LU, Leaky Re LU, softplus(2x+2) 2 1, tanh] Depth (G) [4, 8, 12, 20] Depth (D) [4, 8, 12, 20] Batch norm (G) [True, False] Batch norm (D; layer norm for WGAN-GP) [True, False] Base filter count (G) [32, 64, 128] Base filter count (D) [32, 64, 128]