Understanding the Variance Collapse of SVGD in High Dimensions
Authors: Jimmy Ba, Murat A Erdogdu, Marzyeh Ghassemi, Shengyang Sun, Taiji Suzuki, Denny Wu, Tianzong Zhang
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work we attempt to explain the variance collapse in SVGD. On the qualitative side, we compare the SVGD update with gradient descent on the maximum mean discrepancy (MMD) objective; we find that the variance collapse phenomenon relates to the bias from deterministic updates present in the driving force of SVGD, and empirically verify that removal of such bias leads to more accurate variance estimation. On the quantitative side, we demonstrate that the variance collapse of SVGD can be accurately predicted in the proportional asymptotic limit, i.e., when the number of particles n and dimensions d diverge at the same rate. In particular, for learning high-dimensional isotropic Gaussians, we derive the exact equilibrium variance for both SVGD and MMD-descent, under certain empirically verified near-orthogonality condition on the converged particles, and confirm that SVGD suffers from the curse of dimensionality. |
| Researcher Affiliation | Academia | 1University of Toronto, 2Vector Institute, 3University of Tokyo, 4RIKEN AIP, 5Tsinghua University |
| Pseudocode | Yes | Algorithm 1 SVGD with Particle Resampling |
| Open Source Code | No | The paper does not contain any explicit statement or link to publicly available source code for the methodology described. |
| Open Datasets | No | The paper uses synthetic datasets and a custom BNN setup. For example, it describes generating input and labels for the Bayesian logistic regression problem: "we sample the coordinates of zi from a Rademacher distribution and then normalize the vector by its Euclidean norm; the labels are generated from a Bernoulli distribution with true parameters θ = 1d; we set m = 500, d = 100, and the regularization parameter α = 1." It does not provide access information (link, DOI, formal citation) for a publicly available dataset. |
| Dataset Splits | No | The paper describes experiment setups (e.g., initialization, number of iterations, learning rates) but does not specify training/validation/test dataset splits. |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., programming languages, libraries, or solvers). |
| Experiment Setup | Yes | For both updates we use the Gaussian RBF kernel with the median bandwidth heuristic. For SVGD, we evolve 100 particles for 50k iterations using learning rate η = 5e 3, whereas for MMD-descent, we evolve 10 particles using HMC particles as approximate target samples to compute the driving force term. The particles are either initialized from the (approximate) target distribution (as in Figure 1), or from standard normal distribution (as in Figure 9). ... For all experiments in Section 5, we initialize the particles from N(0, 0.8Id), and run SVGD (or MMD-descent) with learning rate η = 10 1 for 20k iterations. |