Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy

Authors: Danica J. Sutherland, Hsiao-Yu Tung, Heiko Strathmann, Soumyajit De, Aaditya Ramdas, Alex Smola, Arthur Gretton

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 EXPERIMENTS We consider the problem of bandwidth selection for Gaussian RBF kernels on the Blobs dataset of Gretton et al. (2012b). P here is a 5x5 grid of two-dimensional standard normals... For ε ∈ {1, 2, 4, 6, 8, 10}, we take m = 500 samples from each distribution and compute...
Researcher Affiliation Academia Gatsby Computational Neuroscience Unit, University College London School of Computer Science, Carnegie Mellon University Departments of EECS and Statistics, University of California at Berkeley djs@djsutherland.ml htung@cs.cmu.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Code for these experiments is available at github.com/djsutherland/opt-mmd.
Open Datasets Yes trained on the MNIST dataset of handwritten images.
Dataset Splits No The paper mentions using samples for training and testing, but does not provide specific training/validation/test dataset splits (e.g., percentages, counts, or explicit references to standard splits).
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory details) used for running its experiments.
Software Dependencies Yes Intel s MKL library (Intel, 2003 17)
Experiment Setup Yes Implementation details: We used the architecture of Li et al. (2015): the generator consists of fully connected layers with sizes 10, 64, 256, 256, 1024, 784, each with Re LU activations except the last, which uses sigmoids. The kernel function for GMMNs is a sum of Gaussian RBF kernels with fixed bandwidths 2, 5, 10, 20, 40, 80. For the feature matching GAN, we use a discriminator with fully connected layers of size 512, 256, 256, 128, 64, each with sigmoid activation. ... We optimize with SGD. Initialization for all parameters are Gaussian with standard deviation 0.1 for the GMMNs and 0.2 for feature matching. Learning rates are 2, 0.02, 0.5, respectively. Learning rate for the feature matching discriminator is set to 0.01. All experiments are run for 50 000 iterations and use a momentum optimizer with with momentum 0.9.