Inference in Deep Gaussian Processes using Stochastic Gradient Hamiltonian Monte Carlo

Authors: Marton Havasi, José Miguel Hernández-Lobato, Juan José Murillo-Fuentes

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Lastly, we conduct experiments on a variety of supervised regression and classification tasks. We show empirically that our work significantly improves predictions on medium-large datasets at a lower computational cost.
Researcher Affiliation Collaboration Marton Havasi Department of Engineering University of Cambridge mh740@cam.ac.uk Jos e Miguel Hern andez-Lobato Department of Engineering University of Cambridge, Microsoft Research, Alan Turing Institute jmh233@cam.ac.uk Juan Jos e Murillo-Fuentes Department of Signal Theory and Communications University of Sevilla murillo@us.es
Pseudocode Yes Algorithm 1 on the left side of Figure 3 presents the pseudocode for Moving Window MCEM.
Open Source Code Yes Our code is based on the Tensorflow [Abadi et al., 2015] computing library and it is publicly available at https://github.com/cambridge-mlg/sghmc_dgp.
Open Datasets Yes We conducted experiments2 on 9 UCI benchmark datasets ranging from small ( 500 datapoints) to large ( 500,000) for a fair comparison against the baseline.
Dataset Splits No We exercised a random 0.8-0.2 train-test split.
Hardware Specification No The paper does not provide any specific details regarding the hardware used for running the experiments.
Software Dependencies No Our code is based on the Tensorflow [Abadi et al., 2015] computing library and it is publicly available at https://github.com/cambridge-mlg/sghmc_dgp.
Experiment Setup Yes Following Salimbeni and Deisenroth [2017], in all of the models, we set the learning rate to the default 0.01, the minibatch size to 10,000 and the number of iterations to 20,000. One iteration involves drawing a sample from the window and updating the hyperparameters by gradient descent as illustrated in Algorithm 1 in the left side of Figure 3. The depth varied from 0 hidden layers up to 4 with 10 nodes per layer. The covariance function was a standard squared exponential function with separate lengthscales per dimension.