Deep Gaussian Processes for Regression using Approximate Expectation Propagation
Authors: Thang Bui, Daniel Hernandez-Lobato, Jose Hernandez-Lobato, Yingzhen Li, Richard Turner
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the new method for non-linear regression on eleven real-world datasets, showing that it always outperforms GP regression and is almost always better than state-of-the-art deterministic and sampling-based approximate inference methods for Bayesian neural networks. As a by-product, this work provides a comprehensive analysis of six approximate Bayesian methods for training neural networks. |
| Researcher Affiliation | Academia | 1University of Cambridge, 2Harvard University, 3Universidad Aut onoma de Madrid |
| Pseudocode | No | The paper describes algorithms and procedures in paragraph text but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | We released our Theano and Python implementations on https://github.com/thangbui/deepGP_approx_EP. |
| Open Datasets | Yes | We use the ten datasets and train/test splits used by Hern andez-Lobato and Adams (2015) and Gal and Ghahramani (2016): 1 split for the year dataset [N 500000, D = 90], 5 splits for the protein dataset [N 46000, D = 9], and 20 for the others. |
| Dataset Splits | No | The paper mentions 'train/test splits' but does not explicitly specify a 'validation' split percentage, sample counts, or a detailed validation methodology (e.g., k-fold cross-validation with k value) beyond referring to existing splits from other papers. |
| Hardware Specification | No | The paper mentions that computation can be distributed on 'GPUs' but does not specify any particular GPU model, CPU type, memory, or other detailed hardware specifications used for the experiments. |
| Software Dependencies | No | The paper mentions 'Theano' and 'Python' for implementation, and 'Adam' and 'Autograd' for specific functionalities, but does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | In all the experiments reported here, we use Adam with the default learning rate (Kingma & Ba, 2015) for optimising our objective function. We use an exponentiated quadratic kernel with ARD lengthscales for each layer. The hyperparameters and pseudo point locations are different between functions in each layer. ... We include the results for two settings of the number of inducing outputs, M = 50 and M = 100 respectively. Note that for the bigger datasets protein and year, we use M = 100 and M = 200... |