Stein Variational Gradient Descent With Matrix-Valued Kernels
Authors: Dilin Wang, Ziyang Tang, Chandrajit Bajaj, Qiang Liu
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results show that our method outperforms vanilla SVGD and a variety of baseline approaches over a range of real-world Bayesian inference tasks. We empirically evaluate both Newton and Fisher based extensions of SVGD on various practical benchmarks, including Bayesian neural regression and sentence classification, on which our methods show significant improvement over vanilla SVGD and other baseline approaches. |
| Researcher Affiliation | Academia | Department of Computer Science, UT Austin |
| Pseudocode | Yes | Algorithm 1 Stein Variational Gradient Descent with Matrix-valued Kernels (Matrix SVGD) |
| Open Source Code | Yes | Our code is available at https://github.com/dilinwang820/matrix_svgd. |
| Open Datasets | Yes | We consider the binary Covtype2 dataset with 581, 012 data points and 54 features. We partition the data into 70% for training, 10% for validation and 20% for testing. ... We apply our matrix SVGD on Bayesian neural network regression on UCI datasets. For all experiments, we use a two-layer neural network with 50 hidden units with Re LU activation functions. We assign isotropic Gaussian priors to the neural network weights. All datasets3 are randomly partitioned into 90% for training and 10% for testing. |
| Dataset Splits | Yes | We partition the data into 70% for training, 10% for validation and 20% for testing. We choose the best learning rate from [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0] for each method on the validation set. |
| Hardware Specification | No | The paper does not specify the hardware used for experiments (e.g., GPU models, CPU models, memory). It only mentions general cloud support from Google Cloud and Amazon Web Services (AWS) in the Acknowledgement section, which is not a specific hardware specification for reproducibility. |
| Software Dependencies | No | The paper does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). It mentions using 'Adagrad' and 'Adam optimizer' but without specific versions. |
| Experiment Setup | Yes | Following Liu & Wang (2016), we choose the bandwidth of the Gaussian RBF kernels using the standard median trick and use Adagrad (Duchi et al., 2011) for stepsize. We use 50 particles for all the cases. We use 20 particles. We use Adam optimizer with a mini-batch size of 100; for large dataset such as Year, we set the mini-batch size to be 1000. We choose the best learning rate from [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0] for each method on the validation set. We use n = 10 particles for all methods. We use a mini-batch size of 50 and run all the algorithms for 20 epochs with early stop. |