Stein Variational Message Passing for Continuous Graphical Models

Authors: Dilin Wang, Zhe Zeng, Qiang Liu

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical results show that our method outperforms a variety of baselines including standard MCMC and particle message passing methods.
Researcher Affiliation Academia 1Department of Computer Science, The University of Texas at Austin 2School of Mathematical Sciences, Zhejiang University. Correspondence to: Dilin Wang <dilin@cs.utexas.edu>, Qiang Liu <lqiang@cs.utexas.edu>.
Pseudocode Yes Algorithm 1 Graphical Stein Variational Gradient Descent
Open Source Code No The paper does not provide any statement or link indicating the availability of open-source code for the described methodology.
Open Datasets Yes We evaluate our approach on the Price UCI dataset (Liu et al., 2013)
Dataset Splits No The paper mentions using a 'validation dataset' for selecting learning rates ('We also select the best learning rate for Langevin, SVGD (vanilla) and SVGD(graphical) in the aforementioned validation dataset.'), but it does not provide specific, reproducible information about how the data was split into training, validation, and test sets for general model evaluation.
Hardware Specification No The paper does not specify any details about the hardware used for running the experiments (e.g., CPU, GPU models, memory).
Software Dependencies No The paper mentions 'Ada Grad (Duchi et al., 2011)' and 'NUTS (Hoffman & Gelman, 2014)' as software components or algorithms used, but it does not provide specific version numbers for any software or libraries.
Experiment Setup Yes For all our experiments, we use Gaussian RBF kernel for both the vanilla and graphical SVGD and choose the bandwidth using the standard median trick. Specifically, for graphical SVGD, the kernel we use is ki(x, x ) := exp( ||x Ci x Ci||2 2/hi) with bandwidth hi = med2 i where medi is the median of pairwise distances between {xℓ Ci}n ℓ=1 for each node xi. We use Ada Grad (Duchi et al., 2011) for step size unless otherwise specified.