reproducibilityindex.ai

Inference in Deep Gaussian Processes using Stochastic Gradient Hamiltonian Monte Carlo

Authors: Marton Havasi, José Miguel Hernández-Lobato, Juan José Murillo-Fuentes

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Lastly, we conduct experiments on a variety of supervised regression and classiﬁcation tasks. We show empirically that our work signiﬁcantly improves predictions on medium-large datasets at a lower computational cost.
Researcher Affiliation	Collaboration	Marton Havasi Department of Engineering University of Cambridge mh740@cam.ac.uk Jos e Miguel Hern andez-Lobato Department of Engineering University of Cambridge, Microsoft Research, Alan Turing Institute jmh233@cam.ac.uk Juan Jos e Murillo-Fuentes Department of Signal Theory and Communications University of Sevilla murillo@us.es
Pseudocode	Yes	Algorithm 1 on the left side of Figure 3 presents the pseudocode for Moving Window MCEM.
Open Source Code	Yes	Our code is based on the Tensorﬂow [Abadi et al., 2015] computing library and it is publicly available at https://github.com/cambridge-mlg/sghmc_dgp.
Open Datasets	Yes	We conducted experiments2 on 9 UCI benchmark datasets ranging from small ( 500 datapoints) to large ( 500,000) for a fair comparison against the baseline.
Dataset Splits	No	We exercised a random 0.8-0.2 train-test split.
Hardware Specification	No	The paper does not provide any specific details regarding the hardware used for running the experiments.
Software Dependencies	No	Our code is based on the Tensorﬂow [Abadi et al., 2015] computing library and it is publicly available at https://github.com/cambridge-mlg/sghmc_dgp.
Experiment Setup	Yes	Following Salimbeni and Deisenroth [2017], in all of the models, we set the learning rate to the default 0.01, the minibatch size to 10,000 and the number of iterations to 20,000. One iteration involves drawing a sample from the window and updating the hyperparameters by gradient descent as illustrated in Algorithm 1 in the left side of Figure 3. The depth varied from 0 hidden layers up to 4 with 10 nodes per layer. The covariance function was a standard squared exponential function with separate lengthscales per dimension.