A Finite-Particle Convergence Rate for Stein Variational Gradient Descent

Authors: Jiaxin Shi, Lester Mackey

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We provide the first finite-particle convergence rate for Stein variational gradient descent (SVGD), a popular algorithm for approximating a probability distribution with a collection of particles. Specifically, whenever the target distribution is sub Gaussian with a Lipschitz score, SVGD with n particles and an appropriate step size sequence drives the kernel Stein discrepancy to zero at an order 1/ log log n rate. We suspect that the dependence on n can be improved, and we hope that our explicit, non-asymptotic proof strategy will serve as a template for future refinements.
Researcher Affiliation Collaboration Jiaxin Shi Stanford University Stanford, CA 94305 jiaxins@stanford.edu Lester Mackey Microsoft Research New England Cambridge, MA 02474 lmackey@microsoft.com Part of this work was done at Microsoft Research New England.
Pseudocode Yes Algorithm 1 n-particle Stein Variational Gradient Descent [18]: SVGD(µn 0, r) Input: Target P with density p, kernel k, step sizes (ϵs)s 0, particle approximation µn 0 = 1 n Pn i=1 δxi, rounds r for s = 0, , r 1 do xi xi + ϵs 1 n Pn j=1 k(xj, xi) log p(xj) + xk(xj, xi) for i = 1, . . . , n. Output: Updated approximation µn r = 1 n Pn i=1 δxi of the target P. Algorithm 2 Generic Stein Variational Gradient Descent [18]: SVGD(µ0, r) Input: Target P with density p, kernel k, step sizes (ϵs)s 0, approximating measure µ0, rounds r for s = 0, , r 1 do Let µs+1 be the distribution of Xs +ϵs R k(x, Xs) log p(x)+ xk(x, Xs)dµs(x) for Xs µs. Output: Updated approximation µr of the target P
Open Source Code No The paper does not contain any statements offering access to source code for the methodology described, nor does it provide any links to a code repository.
Open Datasets No The paper is theoretical and does not conduct experiments on a dataset; it focuses on mathematical proofs and convergence rates. Therefore, it does not discuss dataset availability for training.
Dataset Splits No The paper is theoretical and does not conduct experiments involving dataset splits. Therefore, it does not provide specific dataset split information for validation, training, or testing.
Hardware Specification No The paper is theoretical and focuses on mathematical proofs and convergence rates rather than empirical evaluation. Therefore, it does not specify any hardware used for running experiments.
Software Dependencies No The paper is theoretical and does not describe running computational experiments. Therefore, it does not list specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and focuses on mathematical proofs and convergence rates rather than empirical evaluation; thus, it does not provide details like hyperparameters or specific training configurations for an empirical setup. While 'step sizes' are mentioned, they are theoretical parameters of the algorithm rather than empirical experimental setup details.