Message Passing Stein Variational Gradient Descent
Authors: Jingwei Zhuo, Chang Liu, Jiaxin Shi, Jun Zhu, Ning Chen, Bo Zhang
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our theoretical analysis finds out that there exists a negative correlation between the dimensionality and the repulsive force of SVGD which should be blamed for this phenomenon. We propose Message Passing SVGD (MP-SVGD) to solve this problem. By leveraging the conditional independence structure of probabilistic graphical models (PGMs), MP-SVGD converts the original highdimensional global inference problem into a set of local ones over the Markov blanket with lower dimensions. Experimental results show its advantages of preventing vanishing repulsive force in high-dimensional space over SVGD, and its particle efficiency and approximation flexibility over other inference methods on graphical models. |
| Researcher Affiliation | Academia | Jingwei Zhuo 1 Chang Liu 1 Jiaxin Shi 1 Jun Zhu 1 Ning Chen 1 Bo Zhang 1 1Dept. of Comp. Sci. & Tech., BNRist Center, State Key Lab for Intell. Tech. & Sys., THBI Lab, Tsinghua University, Beijing, 100084, China. Correspondence to: Jingwei Zhuo <zjw15@mails.tsinghua.edu.cn>, Jun Zhu <dcszj@tsinghua.edu.cn>. |
| Pseudocode | Yes | Algorithm 1 Message Passing SVGD |
| Open Source Code | No | The paper does not provide concrete access to source code (no specific repository link, explicit code release statement, or mention of code in supplementary materials) for the methodology described in this paper. |
| Open Datasets | Yes | We follow the settings of (Lienart et al., 2015) and focus on a pairwise MRF on the 2D grid... We run 100 chains in parallel with 40,000 samples for each chain after 10,000 burned-in, i.e. 4 million samples in total. ... We use the Gaussian distribution as the factors, and the moment matching step is done by numerical integration due to the non-Gaussian nature of p(x). EPBP is a variant of BP methods and the original state-of-the-art method on this task. It uses weighted samples to estimate the messages while other methods (except EP) use unweighted samples to approximate p(x) directly. ... We focus on the pairwise MRF where F indexes all the edge factors, Ji = [1, 1] , N = 1 and J = 15. All the parameters (i.e., ϵ, Ji, σi and sj) are pre-learned and details can be found in (Schmidt et al., 2010). ... Table 1: Denoising results for 10 test images (Lan et al., 2006) from BSD dataset (Martin et al., 2001). |
| Dataset Splits | No | The paper mentions generating ground truth data and using test images from public datasets, but it does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) needed to reproduce the data partitioning for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using a library for Bayesian deep learning (Zhusuan: A library for bayesian deep learning. ar Xiv preprint ar Xiv:1709.05870, 2017) in the references, but it does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers like Python 3.8, PyTorch 1.9) needed to replicate the experiment. |
| Experiment Setup | Yes | We use the RBF kernel with the bandwidth chosen by the median heuristic for all experiments. ... For EP, we use the Gaussian distribution as the factors, and the moment matching step is done by numerical integration due to the non-Gaussian nature of p(x). ... Parameters α1 and α2 are set to 0.6 and 0.4. ... We consider a 10 x 10 grid except Fig. 5, whose grid size ranges from 2 x 2 to 10 x 10. All experimental results are averaged over 10 runs with random initializations. ... We run 100 chains in parallel with 40,000 samples for each chain after 10,000 burned-in, i.e. 4 million samples in total. ... We compare SVGD and MP-SVGD with Gibbs sampling with auxiliary variables (Aux. Gibbs)... |