Maximum Mean Discrepancy Gradient Flow

Authors: Michael Arbel, Anna Korba, Adil SALIM, Arthur Gretton

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We obtain conditions for convergence of the gradient flow towards a global optimum, that can be related to particle transport when optimizing neural networks. We also propose a way to regularize this MMD flow, based on an injection of noise in the gradient. This algorithmic fix comes with theoretical and empirical evidence. The practical implementation of the flow is straightforward, since both the MMD and its gradient have simple closed-form expressions, which can be easily estimated with samples. ... Figure 1 illustrates the behavior of the proposed algorithm (21) in a simple setting and compares it with three other methods: MMD without noise injection (blue traces), MMD with diffusion (green traces) and KSD (purple traces, [32]).
Researcher Affiliation Academia Michael Arbel Gatsby Computational Neuroscience Unit University College London michael.n.arbel@gmail.com Anna Korba Gatsby Computational Neuroscience Unit University College London a.korba@ucl.ac.uk Adil Salim Visual Computing Center KAUST adil.salim@kaust.edu.sa Arthur Gretton Gatsby Computational Neuroscience Unit University College London arthur.gretton@gmail.com
Pseudocode Yes Pseudocode is provided in Algorithm 1.
Open Source Code No The paper does not contain any explicit statement about releasing open-source code for the described methodology or a link to a code repository.
Open Datasets No The paper describes using 'synthetic data uniform on a hyper-sphere' and 'a dataset of 10^3 samples'. Appendix G.1 describes the data generation process ('The data X is generated by sampling from the uniform distribution on a hyper-sphere of radius 1') but does not provide a link, DOI, citation to an existing public dataset, or explicit code for generating the exact dataset used for replication.
Dataset Splits No The paper mentions 'validation error' indicating a validation set was used, but does not provide specific details on the dataset splits (e.g., percentages, sample counts, or explicit methodology for partitioning data into training, validation, and test sets).
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., 'Python 3.x', 'PyTorch 1.x').
Experiment Setup Yes Best step-size was selected for each method from {10^-3, 10^-2, 10^-1} and was used for 10^4 epochs on a dataset of 10^3 samples (RF). Initial parameters of the networks are drawn from i.i.d. gaussians: N(0, 1) for the teacher and N(10^-3, 1) for the student.