On gradient regularizers for MMD GANs
Authors: Michael Arbel, Danica J. Sutherland, Mikołaj Bińkowski, Arthur Gretton
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | experiments show that it stabilizes and accelerates training, giving image generation models that outperform state-of-the art methods on 160 160 Celeb A and 64 64 unconditional Image Net. |
| Researcher Affiliation | Academia | Michael Arbel Gatsby Computational Neuroscience Unit University College London michael.n.arbel@gmail.com Danica J. Sutherland Gatsby Computational Neuroscience Unit University College London djs@djsutherland.ml Mikołaj Bi nkowski Department of Mathematics Imperial College London mikbinkowski@gmail.com Arthur Gretton Gatsby Computational Neuroscience Unit University College London arthur.gretton@gmail.com |
| Pseudocode | No | The paper includes mathematical formulations and propositions but does not present any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code for all of these experiments is available at github.com/Michael Arbel/Scaled-MMD-GAN. |
| Open Datasets | Yes | We evaluated unsupervised image generation on three datasets: CIFAR-10 [26] (60 000 images, 32 32), Celeb A [29] (202 599 face images, resized and cropped to 160 160 as in [7]), and the more challenging ILSVRC2012 (Image Net) dataset [41] (1 281 167 images, resized to 64 64). |
| Dataset Splits | No | The paper mentions using well-known datasets, but it does not provide explicit details about the specific training, validation, or test dataset splits (e.g., percentages, sample counts, or explicit references to predefined splits) for reproducibility. |
| Hardware Specification | No | The paper states that models were trained 'on a single GPU' or 'on 3 GPUs simultaneously' but does not specify the exact GPU models (e.g., NVIDIA A100, Tesla V100) or other hardware components. |
| Software Dependencies | No | The paper mentions using the Adam optimizer, but it does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We always used 64 samples per GPU from each of P and Q, and 5 critic updates per generator step. We used initial learning rates of 0.0001 for CIFAR-10 and Celeb A, 0.0002 for Image Net, and decayed these rates using the KID adaptive scheme of [7]: every 2 000 steps, generator samples are compared to those from 20 000 steps ago, and if the relative KID test [9] fails to show an improvement three consecutive times, the learning rate is decayed by 0.8. We used the Adam optimizer [25] with β1 = 0.5, β2 = 0.9. |