reproducibilityindex.ai

A Finite-Particle Convergence Rate for Stein Variational Gradient Descent

Authors: Jiaxin Shi, Lester Mackey

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We provide the first finite-particle convergence rate for Stein variational gradient descent (SVGD), a popular algorithm for approximating a probability distribution with a collection of particles. Specifically, whenever the target distribution is sub Gaussian with a Lipschitz score, SVGD with n particles and an appropriate step size sequence drives the kernel Stein discrepancy to zero at an order 1/ log log n rate. We suspect that the dependence on n can be improved, and we hope that our explicit, non-asymptotic proof strategy will serve as a template for future refinements.
Researcher Affiliation	Collaboration	Jiaxin Shi Stanford University Stanford, CA 94305 jiaxins@stanford.edu Lester Mackey Microsoft Research New England Cambridge, MA 02474 lmackey@microsoft.com Part of this work was done at Microsoft Research New England.
Pseudocode	Yes	Algorithm 1 n-particle Stein Variational Gradient Descent [18]: SVGD(µn 0, r) Input: Target P with density p, kernel k, step sizes (ϵs)s 0, particle approximation µn 0 = 1 n Pn i=1 δxi, rounds r for s = 0, , r 1 do xi xi + ϵs 1 n Pn j=1 k(xj, xi) log p(xj) + xk(xj, xi) for i = 1, . . . , n. Output: Updated approximation µn r = 1 n Pn i=1 δxi of the target P. Algorithm 2 Generic Stein Variational Gradient Descent [18]: SVGD(µ0, r) Input: Target P with density p, kernel k, step sizes (ϵs)s 0, approximating measure µ0, rounds r for s = 0, , r 1 do Let µs+1 be the distribution of Xs +ϵs R k(x, Xs) log p(x)+ xk(x, Xs)dµs(x) for Xs µs. Output: Updated approximation µr of the target P
Open Source Code	No	The paper does not contain any statements offering access to source code for the methodology described, nor does it provide any links to a code repository.
Open Datasets	No	The paper is theoretical and does not conduct experiments on a dataset; it focuses on mathematical proofs and convergence rates. Therefore, it does not discuss dataset availability for training.
Dataset Splits	No	The paper is theoretical and does not conduct experiments involving dataset splits. Therefore, it does not provide specific dataset split information for validation, training, or testing.
Hardware Specification	No	The paper is theoretical and focuses on mathematical proofs and convergence rates rather than empirical evaluation. Therefore, it does not specify any hardware used for running experiments.
Software Dependencies	No	The paper is theoretical and does not describe running computational experiments. Therefore, it does not list specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and focuses on mathematical proofs and convergence rates rather than empirical evaluation; thus, it does not provide details like hyperparameters or specific training configurations for an empirical setup. While 'step sizes' are mentioned, they are theoretical parameters of the algorithm rather than empirical experimental setup details.