Towards GAN Benchmarks Which Require Generalization

Authors: Ishaan Gulrajani, Colin Raffel, Luke Metz

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 EXPERIMENTS Here we present experiments evaluating our CNN divergence s ability to assess generalization. We trained a WGAN-GP (Gulrajani et al., 2017) on the CIFAR-10 dataset and evaluated the resulting samples using IS, FID, and the CNN divergence using both ˆptrain and ˆptest as our collections of samples from p. To compare the resulting scores to training set memorization, we also evaluated each metric on ˆptrain itself. The results are shown in Table 1. In Table 2 we report the smallest n where memorizing n training set images scores better than the GAN. We train 64 GAN models with randomly-chosen hyperparameters on the training set and evaluate their CNN divergence with respect to both test sets. We plot the results in Figure 2. We train a somewhat large (18M parameters) GAN on a 50,000-image subset of 32 32 Image Net. Every 2000 iterations, we evaluate three CNN divergences: first, with respect to a held-out test set of 10,000 images, second, another independent test set of the same size (to verify that the variance with respect to the choice of test set images is negligible), and last, a 10,000-image subset of the training set (we use a subset to eliminate bias from the dataset size). Each of the 300 resulting CNN divergence evaluations was run completely from scratch. We plot the results in Figure 3. To test whether the CNN divergence prefers models trained on a similar objective, we use it to evaluate several different types of generative models. We train 3 models: a Pixel CNN++ (Salimans et al., 2017), a Res Net VAE with Inverse Autoregressive Flow (IAF) (Kingma et al., 2016), and a DCGAN (Radford et al., 2015) trained with the WGAN-GP objective (Gulrajani et al., 2017).
Researcher Affiliation Industry Ishaan Gulrajani Google Brain igul222@gmail.com Colin Raffel Google Brain craffel@gmail.com Luke Metz Google Brain lmetz@google.com
Pseudocode No The paper describes the architecture and training procedures in narrative text but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes To facilitate future work on NNDs, we make an example implementation of the CNN divergence available.3 (Footnote 3: https://github.com/google-research/google-research/tree/master/towards_gan_benchmarks)
Open Datasets Yes We trained a WGAN-GP (Gulrajani et al., 2017) on the CIFAR-10 dataset... (Section 5) on the 32 32 Image Net dataset (Oord et al., 2016)... (Section 5) Image Net (Deng et al., 2009). (Appendix B)
Dataset Splits No The paper mentions 'training (n=50,000), small test (n=10,000), and large test (n=1,290,000) sets' for 32x32 Image Net, and 'a held-out test set of 10,000 images' for 32x32 Image Net, but it does not explicitly define or specify a separate validation set or its split percentages/counts for hyperparameter tuning or early stopping.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments.
Software Dependencies No The paper describes the architecture and objective used (e.g., 'DCGAN discriminator', 'WGAN-GP objective') and provides a GitHub link which might imply TensorFlow usage, but it does not specify any software names with version numbers (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup Yes We train for 100,000 iterations using minibatches of size 256 with a learning rate of 2 10 4. Our final loss value is computed after training, using an exponential moving average of model weights over training with a coefficient of 0.999. (Appendix D) Training details are given in the appendix. (Section 5)