Inference in Deep Gaussian Processes using Stochastic Gradient Hamiltonian Monte Carlo
Authors: Marton Havasi, José Miguel Hernández-Lobato, Juan José Murillo-Fuentes
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Lastly, we conduct experiments on a variety of supervised regression and classification tasks. We show empirically that our work significantly improves predictions on medium-large datasets at a lower computational cost. |
| Researcher Affiliation | Collaboration | Marton Havasi Department of Engineering University of Cambridge mh740@cam.ac.uk Jos e Miguel Hern andez-Lobato Department of Engineering University of Cambridge, Microsoft Research, Alan Turing Institute jmh233@cam.ac.uk Juan Jos e Murillo-Fuentes Department of Signal Theory and Communications University of Sevilla murillo@us.es |
| Pseudocode | Yes | Algorithm 1 on the left side of Figure 3 presents the pseudocode for Moving Window MCEM. |
| Open Source Code | Yes | Our code is based on the Tensorflow [Abadi et al., 2015] computing library and it is publicly available at https://github.com/cambridge-mlg/sghmc_dgp. |
| Open Datasets | Yes | We conducted experiments2 on 9 UCI benchmark datasets ranging from small ( 500 datapoints) to large ( 500,000) for a fair comparison against the baseline. |
| Dataset Splits | No | We exercised a random 0.8-0.2 train-test split. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware used for running the experiments. |
| Software Dependencies | No | Our code is based on the Tensorflow [Abadi et al., 2015] computing library and it is publicly available at https://github.com/cambridge-mlg/sghmc_dgp. |
| Experiment Setup | Yes | Following Salimbeni and Deisenroth [2017], in all of the models, we set the learning rate to the default 0.01, the minibatch size to 10,000 and the number of iterations to 20,000. One iteration involves drawing a sample from the window and updating the hyperparameters by gradient descent as illustrated in Algorithm 1 in the left side of Figure 3. The depth varied from 0 hidden layers up to 4 with 10 nodes per layer. The covariance function was a standard squared exponential function with separate lengthscales per dimension. |