Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy
Authors: Danica J. Sutherland, Hsiao-Yu Tung, Heiko Strathmann, Soumyajit De, Aaditya Ramdas, Alex Smola, Arthur Gretton
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTS We consider the problem of bandwidth selection for Gaussian RBF kernels on the Blobs dataset of Gretton et al. (2012b). P here is a 5x5 grid of two-dimensional standard normals... For ε ∈ {1, 2, 4, 6, 8, 10}, we take m = 500 samples from each distribution and compute... |
| Researcher Affiliation | Academia | Gatsby Computational Neuroscience Unit, University College London School of Computer Science, Carnegie Mellon University Departments of EECS and Statistics, University of California at Berkeley djs@djsutherland.ml htung@cs.cmu.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code for these experiments is available at github.com/djsutherland/opt-mmd. |
| Open Datasets | Yes | trained on the MNIST dataset of handwritten images. |
| Dataset Splits | No | The paper mentions using samples for training and testing, but does not provide specific training/validation/test dataset splits (e.g., percentages, counts, or explicit references to standard splits). |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory details) used for running its experiments. |
| Software Dependencies | Yes | Intel s MKL library (Intel, 2003 17) |
| Experiment Setup | Yes | Implementation details: We used the architecture of Li et al. (2015): the generator consists of fully connected layers with sizes 10, 64, 256, 256, 1024, 784, each with Re LU activations except the last, which uses sigmoids. The kernel function for GMMNs is a sum of Gaussian RBF kernels with fixed bandwidths 2, 5, 10, 20, 40, 80. For the feature matching GAN, we use a discriminator with fully connected layers of size 512, 256, 256, 128, 64, each with sigmoid activation. ... We optimize with SGD. Initialization for all parameters are Gaussian with standard deviation 0.1 for the GMMNs and 0.2 for feature matching. Learning rates are 2, 0.02, 0.5, respectively. Learning rate for the feature matching discriminator is set to 0.01. All experiments are run for 50 000 iterations and use a momentum optimizer with with momentum 0.9. |