Demystifying MMD GANs

Authors: Mikołaj Bińkowski, Danica J. Sutherland, Michael Arbel, Arthur Gretton

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, the MMD GAN is able to employ a smaller critic network than the Wasserstein GAN, resulting in a simpler and faster-training algorithm with matching performance.
Researcher Affiliation Academia Mikołaj Bi nkowski Department of Mathematics Imperial College London mikbinkowski@gmail.com Danica J. Sutherland , Michael Arbel & Arthur Gretton Gatsby Computational Neuroscience Unit University College London {danica.j.sutherland,michael.n.arbel,arthur.gretton}@gmail.com
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Code for our models is available at github.com/mbinkowski/MMD-GAN.
Open Datasets Yes We compare the quality of samples generated by MMD GAN using various kernels with samples obtained by WGAN-GP (Gulrajani et al., 2017) and Cramér GAN (Bellemare et al., 2017) on four standard benchmark datasets: the MNIST dataset of 28 28 handwritten digits15, the CIFAR-10 dataset of 32 32 photos (Krizhevsky, 2009), the LSUN dataset of bedroom pictures resized to 64 64 (Yu et al., 2015), and the Celeb A dataset of celebrity face images resized and cropped to 160 160 (Liu et al., 2015).
Dataset Splits Yes Quantitative scores are estimated based on 25 000 generator samples (100 000 for MNIST), and compared to 25 000 dataset elements (for LSUN and Celeb A) or the standard test set (10 000 images held out from training for MNIST and CIFAR-10). In supervised deep learning, it is common practice to dynamically reduce the learning rate of an optimizer when it has stopped improving the metric on a validation set.
Hardware Specification No The paper mentions that experiments were run 'on our systems' but does not provide specific details such as CPU/GPU models, memory, or detailed computer specifications. It refers to general architectures like 'DCGAN' and 'Res Net generator' but not the hardware they ran on.
Software Dependencies No The paper mentions 'scikit-learn' (Pedregosa et al., 2011) and 'tensorflow' in footnotes, but does not provide specific version numbers for these or other key software components used in the experiments.
Experiment Setup Yes Each model was trained with a batch size of 64, and 5 discriminator updates per generator update. For CIFAR-10, LSUN and Celeb A we trained for 150 000 generator updates, while for MNIST we used 50 000. The initial learning rate was set to 10 4 and followed the adaptive scheme described in Section 4.1, with KID compared between the current model and the model 20 000 generator steps earlier (5 000 for MNIST), every 2 000 steps (500 for MNIST). After 3 consecutive failures to improve, the learning rate was halved. This approach allowed us to avoid manually picking a different learning rate for each of the considered models. We scaled the gradient penalty by 1, instead of the 10 recommended by Gulrajani et al. (2017) and Bellemare et al. (2017); we found this to usually work slightly better with MMD models.