Understanding the Variance Collapse of SVGD in High Dimensions

Authors: Jimmy Ba, Murat A Erdogdu, Marzyeh Ghassemi, Shengyang Sun, Taiji Suzuki, Denny Wu, Tianzong Zhang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work we attempt to explain the variance collapse in SVGD. On the qualitative side, we compare the SVGD update with gradient descent on the maximum mean discrepancy (MMD) objective; we find that the variance collapse phenomenon relates to the bias from deterministic updates present in the driving force of SVGD, and empirically verify that removal of such bias leads to more accurate variance estimation. On the quantitative side, we demonstrate that the variance collapse of SVGD can be accurately predicted in the proportional asymptotic limit, i.e., when the number of particles n and dimensions d diverge at the same rate. In particular, for learning high-dimensional isotropic Gaussians, we derive the exact equilibrium variance for both SVGD and MMD-descent, under certain empirically verified near-orthogonality condition on the converged particles, and confirm that SVGD suffers from the curse of dimensionality.
Researcher Affiliation Academia 1University of Toronto, 2Vector Institute, 3University of Tokyo, 4RIKEN AIP, 5Tsinghua University
Pseudocode Yes Algorithm 1 SVGD with Particle Resampling
Open Source Code No The paper does not contain any explicit statement or link to publicly available source code for the methodology described.
Open Datasets No The paper uses synthetic datasets and a custom BNN setup. For example, it describes generating input and labels for the Bayesian logistic regression problem: "we sample the coordinates of zi from a Rademacher distribution and then normalize the vector by its Euclidean norm; the labels are generated from a Bernoulli distribution with true parameters θ = 1d; we set m = 500, d = 100, and the regularization parameter α = 1." It does not provide access information (link, DOI, formal citation) for a publicly available dataset.
Dataset Splits No The paper describes experiment setups (e.g., initialization, number of iterations, learning rates) but does not specify training/validation/test dataset splits.
Hardware Specification No The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., programming languages, libraries, or solvers).
Experiment Setup Yes For both updates we use the Gaussian RBF kernel with the median bandwidth heuristic. For SVGD, we evolve 100 particles for 50k iterations using learning rate η = 5e 3, whereas for MMD-descent, we evolve 10 particles using HMC particles as approximate target samples to compute the driving force term. The particles are either initialized from the (approximate) target distribution (as in Figure 1), or from standard normal distribution (as in Figure 9). ... For all experiments in Section 5, we initialize the particles from N(0, 0.8Id), and run SVGD (or MMD-descent) with learning rate η = 10 1 for 20k iterations.