Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Authors: Qiang Liu, Dilin Wang

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical studies are performed on various real world models and datasets, on which our method is competitive with existing state-of-the-art methods.
Researcher Affiliation Academia Qiang Liu Dilin Wang Department of Computer Science Dartmouth College Hanover, NH 03755 {qiang.liu, dilin.wang.gr}@dartmouth.edu
Pseudocode Yes Algorithm 1 Bayesian Inference via Variational Gradient Descent
Open Source Code Yes Our code is available at https://github.com/DartML/Stein-Variational-Gradient-Descent.
Open Datasets Yes We compared our algorithm with the no-U-turn sampler (NUTS)1 [29] and non-parametric variational inference (NPV)2 [5] on the 8 datasets (N > 500) used in Gershman et al. [5]... We further test the binary Covertype dataset3 with 581,012 data points and 54 features.
Dataset Splits Yes We partition the data into 80% for training and 20% for testing and average on 50 random trials. ... we select a using a validation set within the training set.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions software components like "Ada Grad", "MATLAB", and "RELU(x)" but does not provide specific version numbers for these or other key software dependencies.
Experiment Setup Yes For all our experiments, we use RBF kernel k(x, x ) = exp( 1/h||x x ||2 2), and take the bandwidth to be h = med2/ log n... We use Ada Grad for step size and initialize the particles using the prior distribution unless otherwise specified. ... A mini-batch size of 50 is used for all the algorithms. ... We use neural networks with one hidden layers, and take 50 hidden units for most datasets, except that we take 100 units for Protein and Year.