Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
Authors: Qiang Liu, Dilin Wang
NeurIPS 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical studies are performed on various real world models and datasets, on which our method is competitive with existing state-of-the-art methods. |
| Researcher Affiliation | Academia | Qiang Liu Dilin Wang Department of Computer Science Dartmouth College Hanover, NH 03755 EMAIL |
| Pseudocode | Yes | Algorithm 1 Bayesian Inference via Variational Gradient Descent |
| Open Source Code | Yes | Our code is available at https://github.com/DartML/Stein-Variational-Gradient-Descent. |
| Open Datasets | Yes | We compared our algorithm with the no-U-turn sampler (NUTS)1 [29] and non-parametric variational inference (NPV)2 [5] on the 8 datasets (N > 500) used in Gershman et al. [5]... We further test the binary Covertype dataset3 with 581,012 data points and 54 features. |
| Dataset Splits | Yes | We partition the data into 80% for training and 20% for testing and average on 50 random trials. ... we select a using a validation set within the training set. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software components like "Ada Grad", "MATLAB", and "RELU(x)" but does not provide specific version numbers for these or other key software dependencies. |
| Experiment Setup | Yes | For all our experiments, we use RBF kernel k(x, x ) = exp( 1/h||x x ||2 2), and take the bandwidth to be h = med2/ log n... We use Ada Grad for step size and initialize the particles using the prior distribution unless otherwise specified. ... A mini-batch size of 50 is used for all the algorithms. ... We use neural networks with one hidden layers, and take 50 hidden units for most datasets, except that we take 100 units for Protein and Year. |