Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
Authors: Qiang Liu, Dilin Wang
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical studies are performed on various real world models and datasets, on which our method is competitive with existing state-of-the-art methods. |
| Researcher Affiliation | Academia | Qiang Liu Dilin Wang Department of Computer Science Dartmouth College Hanover, NH 03755 {qiang.liu, dilin.wang.gr}@dartmouth.edu |
| Pseudocode | Yes | Algorithm 1 Bayesian Inference via Variational Gradient Descent |
| Open Source Code | Yes | Our code is available at https://github.com/DartML/Stein-Variational-Gradient-Descent. |
| Open Datasets | Yes | We compared our algorithm with the no-U-turn sampler (NUTS)1 [29] and non-parametric variational inference (NPV)2 [5] on the 8 datasets (N > 500) used in Gershman et al. [5]... We further test the binary Covertype dataset3 with 581,012 data points and 54 features. |
| Dataset Splits | Yes | We partition the data into 80% for training and 20% for testing and average on 50 random trials. ... we select a using a validation set within the training set. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software components like "Ada Grad", "MATLAB", and "RELU(x)" but does not provide specific version numbers for these or other key software dependencies. |
| Experiment Setup | Yes | For all our experiments, we use RBF kernel k(x, x ) = exp( 1/h||x x ||2 2), and take the bandwidth to be h = med2/ log n... We use Ada Grad for step size and initialize the particles using the prior distribution unless otherwise specified. ... A mini-batch size of 50 is used for all the algorithms. ... We use neural networks with one hidden layers, and take 50 hidden units for most datasets, except that we take 100 units for Protein and Year. |