Demystifying MMD GANs
Authors: Mikołaj Bińkowski, Danica J. Sutherland, Michael Arbel, Arthur Gretton
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, the MMD GAN is able to employ a smaller critic network than the Wasserstein GAN, resulting in a simpler and faster-training algorithm with matching performance. |
| Researcher Affiliation | Academia | Mikołaj Bi nkowski Department of Mathematics Imperial College London mikbinkowski@gmail.com Danica J. Sutherland , Michael Arbel & Arthur Gretton Gatsby Computational Neuroscience Unit University College London {danica.j.sutherland,michael.n.arbel,arthur.gretton}@gmail.com |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code for our models is available at github.com/mbinkowski/MMD-GAN. |
| Open Datasets | Yes | We compare the quality of samples generated by MMD GAN using various kernels with samples obtained by WGAN-GP (Gulrajani et al., 2017) and Cramér GAN (Bellemare et al., 2017) on four standard benchmark datasets: the MNIST dataset of 28 28 handwritten digits15, the CIFAR-10 dataset of 32 32 photos (Krizhevsky, 2009), the LSUN dataset of bedroom pictures resized to 64 64 (Yu et al., 2015), and the Celeb A dataset of celebrity face images resized and cropped to 160 160 (Liu et al., 2015). |
| Dataset Splits | Yes | Quantitative scores are estimated based on 25 000 generator samples (100 000 for MNIST), and compared to 25 000 dataset elements (for LSUN and Celeb A) or the standard test set (10 000 images held out from training for MNIST and CIFAR-10). In supervised deep learning, it is common practice to dynamically reduce the learning rate of an optimizer when it has stopped improving the metric on a validation set. |
| Hardware Specification | No | The paper mentions that experiments were run 'on our systems' but does not provide specific details such as CPU/GPU models, memory, or detailed computer specifications. It refers to general architectures like 'DCGAN' and 'Res Net generator' but not the hardware they ran on. |
| Software Dependencies | No | The paper mentions 'scikit-learn' (Pedregosa et al., 2011) and 'tensorflow' in footnotes, but does not provide specific version numbers for these or other key software components used in the experiments. |
| Experiment Setup | Yes | Each model was trained with a batch size of 64, and 5 discriminator updates per generator update. For CIFAR-10, LSUN and Celeb A we trained for 150 000 generator updates, while for MNIST we used 50 000. The initial learning rate was set to 10 4 and followed the adaptive scheme described in Section 4.1, with KID compared between the current model and the model 20 000 generator steps earlier (5 000 for MNIST), every 2 000 steps (500 for MNIST). After 3 consecutive failures to improve, the learning rate was halved. This approach allowed us to avoid manually picking a different learning rate for each of the considered models. We scaled the gradient penalty by 1, instead of the 10 recommended by Gulrajani et al. (2017) and Bellemare et al. (2017); we found this to usually work slightly better with MMD models. |