On gradient regularizers for MMD GANs

Authors: Michael Arbel, Danica J. Sutherland, Mikołaj Bińkowski, Arthur Gretton

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental experiments show that it stabilizes and accelerates training, giving image generation models that outperform state-of-the art methods on 160 160 Celeb A and 64 64 unconditional Image Net.
Researcher Affiliation Academia Michael Arbel Gatsby Computational Neuroscience Unit University College London michael.n.arbel@gmail.com Danica J. Sutherland Gatsby Computational Neuroscience Unit University College London djs@djsutherland.ml Mikołaj Bi nkowski Department of Mathematics Imperial College London mikbinkowski@gmail.com Arthur Gretton Gatsby Computational Neuroscience Unit University College London arthur.gretton@gmail.com
Pseudocode No The paper includes mathematical formulations and propositions but does not present any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code for all of these experiments is available at github.com/Michael Arbel/Scaled-MMD-GAN.
Open Datasets Yes We evaluated unsupervised image generation on three datasets: CIFAR-10 [26] (60 000 images, 32 32), Celeb A [29] (202 599 face images, resized and cropped to 160 160 as in [7]), and the more challenging ILSVRC2012 (Image Net) dataset [41] (1 281 167 images, resized to 64 64).
Dataset Splits No The paper mentions using well-known datasets, but it does not provide explicit details about the specific training, validation, or test dataset splits (e.g., percentages, sample counts, or explicit references to predefined splits) for reproducibility.
Hardware Specification No The paper states that models were trained 'on a single GPU' or 'on 3 GPUs simultaneously' but does not specify the exact GPU models (e.g., NVIDIA A100, Tesla V100) or other hardware components.
Software Dependencies No The paper mentions using the Adam optimizer, but it does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We always used 64 samples per GPU from each of P and Q, and 5 critic updates per generator step. We used initial learning rates of 0.0001 for CIFAR-10 and Celeb A, 0.0002 for Image Net, and decayed these rates using the KID adaptive scheme of [7]: every 2 000 steps, generator samples are compared to those from 20 000 steps ago, and if the relative KID test [9] fails to show an improvement three consecutive times, the learning rate is decayed by 0.8. We used the Adam optimizer [25] with β1 = 0.5, β2 = 0.9.