Riemannian Stein Variational Gradient Descent for Bayesian Inference
Authors: Chang Liu, Jun Zhu
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show the advantages over SVGD of exploring distribution geometry and the advantages of particleefficiency, iteration-effectiveness and approximation flexibility over other inference methods on Riemann manifolds. |
| Researcher Affiliation | Academia | Dept. of Comp. Sci. & Tech., TNList Lab; Center for Bio-Inspired Computing Research State Key Lab for Intell. Tech. & Systems, Tsinghua University, Beijing, China |
| Pseudocode | No | The paper presents mathematical derivations and descriptions of the algorithm's dynamics and updates (e.g., "A 1st-order approximation of the flow is y(t + ε) = Expy(t) ε ˆX(y(t))"), but it does not include a formally structured pseudocode block or an algorithm listing. |
| Open Source Code | Yes | Codes and data available at http://ml.cs.tsinghua.edu.cn/ changliu/rsvgd/ |
| Open Datasets | Yes | We use the Splice19 dataset (1,000 training entries, 60 features), one of the benchmark datasets compiled by Mika et al. (1999), and the Covertype dataset (581,012 entries, 54 features) also used by Liu and Wang (2016). We run all the methods on the 20News-different dataset (1,666 training entries, 5,000 features) with default hyperparameters as the same as (Liu, Zhu, and Song 2016). |
| Dataset Splits | Yes | Each run on Covertype uses a random train(80%)-test(20%) split as in (Liu and Wang 2016). |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU/CPU models, memory, or specific computing infrastructure used for the experiments. |
| Software Dependencies | No | The paper does not explicitly mention any software dependencies with specific version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | We fix α = 0.01 and use 100 particles for both methods. Each run on Covertype uses a random train(80%)-test(20%) split as in (Liu and Wang 2016). RSVGD updates particles by the aforementioned 1st-order flow approximation, which is effectively the vanilla gradient descent, while SVGD uses the recommended Ada Grad with momentum. We run all the methods on the 20News-different dataset (1,666 training entries, 5,000 features) with default hyperparameters as the same as (Liu, Zhu, and Song 2016). |