Maximum Mean Discrepancy Gradient Flow
Authors: Michael Arbel, Anna Korba, Adil SALIM, Arthur Gretton
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We obtain conditions for convergence of the gradient flow towards a global optimum, that can be related to particle transport when optimizing neural networks. We also propose a way to regularize this MMD flow, based on an injection of noise in the gradient. This algorithmic fix comes with theoretical and empirical evidence. The practical implementation of the flow is straightforward, since both the MMD and its gradient have simple closed-form expressions, which can be easily estimated with samples. ... Figure 1 illustrates the behavior of the proposed algorithm (21) in a simple setting and compares it with three other methods: MMD without noise injection (blue traces), MMD with diffusion (green traces) and KSD (purple traces, [32]). |
| Researcher Affiliation | Academia | Michael Arbel Gatsby Computational Neuroscience Unit University College London michael.n.arbel@gmail.com Anna Korba Gatsby Computational Neuroscience Unit University College London a.korba@ucl.ac.uk Adil Salim Visual Computing Center KAUST adil.salim@kaust.edu.sa Arthur Gretton Gatsby Computational Neuroscience Unit University College London arthur.gretton@gmail.com |
| Pseudocode | Yes | Pseudocode is provided in Algorithm 1. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing open-source code for the described methodology or a link to a code repository. |
| Open Datasets | No | The paper describes using 'synthetic data uniform on a hyper-sphere' and 'a dataset of 10^3 samples'. Appendix G.1 describes the data generation process ('The data X is generated by sampling from the uniform distribution on a hyper-sphere of radius 1') but does not provide a link, DOI, citation to an existing public dataset, or explicit code for generating the exact dataset used for replication. |
| Dataset Splits | No | The paper mentions 'validation error' indicating a validation set was used, but does not provide specific details on the dataset splits (e.g., percentages, sample counts, or explicit methodology for partitioning data into training, validation, and test sets). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., 'Python 3.x', 'PyTorch 1.x'). |
| Experiment Setup | Yes | Best step-size was selected for each method from {10^-3, 10^-2, 10^-1} and was used for 10^4 epochs on a dataset of 10^3 samples (RF). Initial parameters of the networks are drawn from i.i.d. gaussians: N(0, 1) for the teacher and N(10^-3, 1) for the student. |