A Finite-Particle Convergence Rate for Stein Variational Gradient Descent
Authors: Jiaxin Shi, Lester Mackey
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We provide the first finite-particle convergence rate for Stein variational gradient descent (SVGD), a popular algorithm for approximating a probability distribution with a collection of particles. Specifically, whenever the target distribution is sub Gaussian with a Lipschitz score, SVGD with n particles and an appropriate step size sequence drives the kernel Stein discrepancy to zero at an order 1/ log log n rate. We suspect that the dependence on n can be improved, and we hope that our explicit, non-asymptotic proof strategy will serve as a template for future refinements. |
| Researcher Affiliation | Collaboration | Jiaxin Shi Stanford University Stanford, CA 94305 jiaxins@stanford.edu Lester Mackey Microsoft Research New England Cambridge, MA 02474 lmackey@microsoft.com Part of this work was done at Microsoft Research New England. |
| Pseudocode | Yes | Algorithm 1 n-particle Stein Variational Gradient Descent [18]: SVGD(µn 0, r) Input: Target P with density p, kernel k, step sizes (ϵs)s 0, particle approximation µn 0 = 1 n Pn i=1 δxi, rounds r for s = 0, , r 1 do xi xi + ϵs 1 n Pn j=1 k(xj, xi) log p(xj) + xk(xj, xi) for i = 1, . . . , n. Output: Updated approximation µn r = 1 n Pn i=1 δxi of the target P. Algorithm 2 Generic Stein Variational Gradient Descent [18]: SVGD(µ0, r) Input: Target P with density p, kernel k, step sizes (ϵs)s 0, approximating measure µ0, rounds r for s = 0, , r 1 do Let µs+1 be the distribution of Xs +ϵs R k(x, Xs) log p(x)+ xk(x, Xs)dµs(x) for Xs µs. Output: Updated approximation µr of the target P |
| Open Source Code | No | The paper does not contain any statements offering access to source code for the methodology described, nor does it provide any links to a code repository. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments on a dataset; it focuses on mathematical proofs and convergence rates. Therefore, it does not discuss dataset availability for training. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments involving dataset splits. Therefore, it does not provide specific dataset split information for validation, training, or testing. |
| Hardware Specification | No | The paper is theoretical and focuses on mathematical proofs and convergence rates rather than empirical evaluation. Therefore, it does not specify any hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not describe running computational experiments. Therefore, it does not list specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and focuses on mathematical proofs and convergence rates rather than empirical evaluation; thus, it does not provide details like hyperparameters or specific training configurations for an empirical setup. While 'step sizes' are mentioned, they are theoretical parameters of the algorithm rather than empirical experimental setup details. |