The Unusual Effectiveness of Averaging in GAN Training

Authors: Yasin Yaz{\i}c{\i}, Chuan-Sheng Foo, Stefan Winkler, Kim-Hui Yap, Georgios Piliouras, Vijay Chandrasekhar

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We establish experimentally that both techniques are strikingly effective in the non-convex-concave GAN setting as well. Both improve inception and FID scores on different architectures and for different GAN objectives. We provide comprehensive experimental results across a range of datasets mixture of Gaussians, CIFAR-10, STL-10, Celeb A and Image Net to demonstrate its effectiveness. We achieve state-of-the-art results on CIFAR-10 and produce clean Celeb A face images.
Researcher Affiliation Academia Yasin Yazıcı Nanyang Technological University (NTU) Chuan-Sheng Foo Institute for Infocomm Research, A*STAR Stefan Winkler National University of Singapore (NUS) Kim-Hui Yap Nanyang Technological University (NTU) Georgios Piliouras Singapore University of Technology and Design Vijay Chandrasekhar Institute for Infocomm Research, A*STAR
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes 1 The code is available at https://github.com/yasinyazici/EMA_GAN
Open Datasets Yes We use both illustrative examples (i.e. mixtures of Gaussians) as well as four commonly used realworld datasets, namely CIFAR-10 (Krizhevsky et al., 2009), STL-10 (Coates et al., 2011), Celeb A (Liu et al., 2015), and Image Net (Russakovsky et al., 2015) to show the effectiveness of averaging.
Dataset Splits No The paper uses well-known datasets (e.g., CIFAR-10) but does not explicitly provide specific training, validation, or test dataset split information (e.g., percentages, sample counts, or explicit reference to predefined splits) for its experiments.
Hardware Specification No The paper mentions that computational work was performed on "resources of the National Supercomputing Centre, Singapore" but does not provide specific details such as GPU or CPU models, processor types, or memory specifications.
Software Dependencies No The paper mentions using Chainer for evaluation and the Python Optimal Transport package, but it does not provide specific version numbers for these or any other software dependencies needed to replicate the experiment.
Experiment Setup Yes For the optimizer, we use ADAM (Kingma & Ba, 2014) with α = 0.0002, β1 = 0.0 and β2 = 0.9. For the GAN objective, we use the original GAN (Referring to Non-Saturating variant) (Goodfellow et al., 2014) objective and the Wasserstein-1 (Arjovsky et al., 2017) objective, with Lipschitz constraint satisfied by gradient penalty (Gulrajani et al., 2017)... Unless stated otherwise, the objective is the original one, the architecture is conventional, discriminator to generator update ratio is 1, β value is 0.9999 and MA starting point is 100k. The maximum number of iterations for any experiment is 500k. All experiments are repeated at least 3 times with random initializations to show that the results are consistent across different runs. (Refer to Table 11 in Appendix D: batch size = 64 discriminator learning rate = 0.0002 generator learning rate = 0.0002 ADAM β1 = 0.0 ADAM β2 = 0.9 ADAM ϵ = 1e 8 β = 0.9999 for EMA max iteration = 500000 WGAN-GP λ = 10.0 WGAN-GP ndis = 5 MA start point = 100000 GAN objective = GAN or WGAN-GP Optimizer = ADAM)