reproducibilityindex.ai

A Large-Scale Study on Regularization and Normalization in GANs

Authors: Karol Kurach, Mario Lučić, Xiaohua Zhai, Marcin Michalski, Sylvain Gelly

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work we provide a thorough empirical analysis of these competing approaches, and help the researchers and practitioners navigate this space. We first define the GAN landscape the set of loss functions, normalization and regularization schemes, and the most commonly used architectures. We explore this search space on several modern large-scale datasets by means of hyperparameter optimization, considering both good sets of hyperparameters reported in the literature, as well as those obtained by sequential Bayesian optimization.
Researcher Affiliation	Industry	Karol Kurach * 1 Mario Lucic * 1 Xiaohua Zhai 1 Marcin Michalski 1 Sylvain Gelly 1 Generative adversarial networks (GANs) are a class of deep generative models which aim to learn a target distribution in an unsupervised fashion. While they were successfully applied to many problems, training a GAN is a notoriously challenging task and requires a significant number of hyperparameter tuning, neural architecture engineering, and a non-trivial amount of tricks . The success in many practical applications coupled with the lack of a measure to quantify the failure modes of GANs resulted in a plethora of proposed losses, regularization and normalization schemes, as well as neural architectures. In this work we take a sober view of the current state of GANs from a practical perspective. We discuss and evaluate common pitfalls and reproducibility issues, open-source our code on Github, and provide pre-trained models on Tensor Flow Hub. 1. Introduction Deep generative models are a powerful class of (mostly) unsupervised machine learning models. These models were recently applied to great effect in a variety of applications, including image generation, learned compression, and domain adaptation (Brock et al., 2019; Menick & Kalchbrenner, 2019; Karras et al., 2019; Lucic et al., 2019; Isola et al., 2017; Tschannen et al., 2018). Generative adversarial networks (GANs) (Goodfellow et al., 2014) are one of the main approaches to learning such models in a fully unsupervised fashion. The GAN framework can be viewed as a two-player game where the first player, the generator, is learning to transform some simple input distribution to a complex high-dimensional distribution (e.g. over natural images), such that the second player, the discriminator, cannot tell whether the samples were drawn *Equal contribution 1Google Research, Brain Team. Correspondence to: Karol Kurach <kkurach@google.com>, Mario Lucic <lucic@google.com>.
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	We provide reference implementations, including training and evaluation code on Github1, and provide pre-trained models on Tensor Flow Hub2. 1www.github.com/google/compare_gan 2www.tensorflow.org/hub
Open Datasets	Yes	We consider three datasets, namely CIFAR10, CELEBA-HQ128, and LSUN-BEDROOM. The LSUN-BEDROOM dataset contains slightly more than 3 million images (Yu et al., 2015).3 We randomly partition the images into a train and test set whereby we use 30588 images as the test set. Secondly, we use the CELEBA-HQ dataset of 30K images (Karras et al., 2018). We use the 128 128 3 version obtained by running the code provided by the authors.4 We use 3K examples as the test set and the remaining examples as the training set. Finally, we also include the CIFAR10 dataset which contains 70K images (32 32 3), partitioned into 60K training instances and 10K testing instances.
Dataset Splits	No	The paper explicitly defines train and test splits for the datasets (e.g., 60K training and 10K testing instances for CIFAR10, 3K test and remaining for training for CELEBA-HQ, train and test for LSUN with 30588 test images), but does not specify a separate validation dataset split for hyperparameter tuning or early stopping. While sequential Bayesian optimization is used, the paper does not detail how a validation set was explicitly partitioned or used from the described training data for this purpose.
Hardware Specification	No	The paper does not explicitly describe the hardware used for its experiments, such as specific GPU models, CPU types, or other computing resources.
Software Dependencies	No	The paper mentions using "Tensor Flow Hub" and "Adam optimizer (Kingma & Ba, 2015)", implying TensorFlow as the framework and Adam as the optimizer, but it does not specify any version numbers for TensorFlow or any other software dependencies crucial for reproduction.
Experiment Setup	Yes	We summarize the fixed hyperparameter settings in Table 1 which contains the good parameters reported in recent publications (Fedus et al., 2018; Miyato et al., 2018; Gulrajani et 2017). In particular, we consider the Cartesian product of these parameters to obtain 24 hyperparameter settings to reduce the survivorship bias. Finally, to provide a fair comparison, we perform sequential Bayesian optimization (Srinivas et al., 2010) on the parameter ranges provided in Table 2. We run 12 rounds (i.e. we communicate with the oracle 12 times) of sequential optimization, each with a batch of 10 hyperparameter sets selected based on the FID scores from the results of the previous iterations. As we explore the number of discriminator updates per generator update (1 or 5), this leads to an additional 240 hyperparameter settings which in some cases outperform the previously known hyperparameter settings. The batch size is set to 64 for all the experiments. We use a fixed the number of discriminator update steps of 100K for LSUN-BEDROOM dataset and CELEBA-HQ-128 dataset, and 200K for CIFAR10 dataset. We apply the Adam optimizer (Kingma & Ba, 2015).