Riemannian Stein Variational Gradient Descent for Bayesian Inference

Authors: Chang Liu, Jun Zhu

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show the advantages over SVGD of exploring distribution geometry and the advantages of particleefficiency, iteration-effectiveness and approximation flexibility over other inference methods on Riemann manifolds.
Researcher Affiliation Academia Dept. of Comp. Sci. & Tech., TNList Lab; Center for Bio-Inspired Computing Research State Key Lab for Intell. Tech. & Systems, Tsinghua University, Beijing, China
Pseudocode No The paper presents mathematical derivations and descriptions of the algorithm's dynamics and updates (e.g., "A 1st-order approximation of the flow is y(t + ε) = Expy(t) ε ˆX(y(t))"), but it does not include a formally structured pseudocode block or an algorithm listing.
Open Source Code Yes Codes and data available at http://ml.cs.tsinghua.edu.cn/ changliu/rsvgd/
Open Datasets Yes We use the Splice19 dataset (1,000 training entries, 60 features), one of the benchmark datasets compiled by Mika et al. (1999), and the Covertype dataset (581,012 entries, 54 features) also used by Liu and Wang (2016). We run all the methods on the 20News-different dataset (1,666 training entries, 5,000 features) with default hyperparameters as the same as (Liu, Zhu, and Song 2016).
Dataset Splits Yes Each run on Covertype uses a random train(80%)-test(20%) split as in (Liu and Wang 2016).
Hardware Specification No The paper does not specify any hardware details such as GPU/CPU models, memory, or specific computing infrastructure used for the experiments.
Software Dependencies No The paper does not explicitly mention any software dependencies with specific version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes We fix α = 0.01 and use 100 particles for both methods. Each run on Covertype uses a random train(80%)-test(20%) split as in (Liu and Wang 2016). RSVGD updates particles by the aforementioned 1st-order flow approximation, which is effectively the vanilla gradient descent, while SVGD uses the recommended Ada Grad with momentum. We run all the methods on the 20News-different dataset (1,666 training entries, 5,000 features) with default hyperparameters as the same as (Liu, Zhu, and Song 2016).