Stein Variational Gradient Descent With Matrix-Valued Kernels

Authors: Dilin Wang, Ziyang Tang, Chandrajit Bajaj, Qiang Liu

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results show that our method outperforms vanilla SVGD and a variety of baseline approaches over a range of real-world Bayesian inference tasks. We empirically evaluate both Newton and Fisher based extensions of SVGD on various practical benchmarks, including Bayesian neural regression and sentence classification, on which our methods show significant improvement over vanilla SVGD and other baseline approaches.
Researcher Affiliation Academia Department of Computer Science, UT Austin
Pseudocode Yes Algorithm 1 Stein Variational Gradient Descent with Matrix-valued Kernels (Matrix SVGD)
Open Source Code Yes Our code is available at https://github.com/dilinwang820/matrix_svgd.
Open Datasets Yes We consider the binary Covtype2 dataset with 581, 012 data points and 54 features. We partition the data into 70% for training, 10% for validation and 20% for testing. ... We apply our matrix SVGD on Bayesian neural network regression on UCI datasets. For all experiments, we use a two-layer neural network with 50 hidden units with Re LU activation functions. We assign isotropic Gaussian priors to the neural network weights. All datasets3 are randomly partitioned into 90% for training and 10% for testing.
Dataset Splits Yes We partition the data into 70% for training, 10% for validation and 20% for testing. We choose the best learning rate from [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0] for each method on the validation set.
Hardware Specification No The paper does not specify the hardware used for experiments (e.g., GPU models, CPU models, memory). It only mentions general cloud support from Google Cloud and Amazon Web Services (AWS) in the Acknowledgement section, which is not a specific hardware specification for reproducibility.
Software Dependencies No The paper does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). It mentions using 'Adagrad' and 'Adam optimizer' but without specific versions.
Experiment Setup Yes Following Liu & Wang (2016), we choose the bandwidth of the Gaussian RBF kernels using the standard median trick and use Adagrad (Duchi et al., 2011) for stepsize. We use 50 particles for all the cases. We use 20 particles. We use Adam optimizer with a mini-batch size of 100; for large dataset such as Year, we set the mini-batch size to be 1000. We choose the best learning rate from [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0] for each method on the validation set. We use n = 10 particles for all methods. We use a mini-batch size of 50 and run all the algorithms for 20 epochs with early stop.