Random Feature Expansions for Deep Gaussian Processes
Authors: Kurt Cutajar, Edwin V. Bonilla, Pietro Michiardi, Maurizio Filippone
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We extensively showcase the scalability and performance of our proposal on several datasets with up to 8 million observations, and various DGP architectures with up to 30 hidden layers. We extensively demonstrate the effectiveness of our proposal on a variety of regression and classification problems by comparing it with DNNs and other state-of-the-art approaches to infer DGPs. We evaluate our model by comparing it against relevant alternatives for both regression and classification, and assess its performance when applied to large-scale datasets. |
| Researcher Affiliation | Academia | 1Department of Data Science, EURECOM, France 2School of Computer Science and Engineering, University of New South Wales, Australia. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions that a competitor's code is available ('Code obtained from: github.com/thangbui/deepGP_approxEP') but does not provide an explicit statement or link for the source code of its own proposed method. |
| Open Datasets | Yes | We use the same experimental set-up for both regression and classification tasks using datasets from the UCI repository (Asuncion & Newman, 2007). We focus part of the experiments on large-scale problems, such as MNIST8M digit classification and the AIRLINE dataset. |
| Dataset Splits | No | The paper mentions 'withheld test data' and that 'The results are averaged over 3 folds for every dataset,' which implies a splitting strategy. However, it does not explicitly provide specific percentages for training, validation, and test splits, nor does it clearly define a separate validation set split needed for reproduction. |
| Hardware Specification | Yes | The experiments were launched on single nodes of a cluster of Intel Xeon E5-2630 CPUs having 32 cores and 128GB RAM. |
| Software Dependencies | No | The paper states that the model was implemented in 'Tensor Flow (Abadi et al., 2015)' but does not specify a version number for TensorFlow or any other software dependencies, making it difficult to reproduce the exact software environment. |
| Experiment Setup | Yes | In the proposed DGP with an RBF kernel, we use 100 random features at every hidden layer to construct a multivariate GP with D(l) F = 3, and set the batch size to m = 200. We initially only use a single Monte Carlo sample, and halfway through the allocated optimization time, this is then increased to 100 samples. We employ the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 0.01, and in order to stabilize the optimization procedure, we fix the parameters Θ for 12, 000 iterations, before jointly optimizing all parameters. For DGP-RBF and DGP-ARC, we use 500 random features, 50 GPs in the hidden layers, batch size of 1000, and Adam with a 0.001 learning rate. We construct a DNN configured with a dropout rate of 0.5 at each hidden layer in order to provide regularization during training. |