Random Feature Expansions for Deep Gaussian Processes

Authors: Kurt Cutajar, Edwin V. Bonilla, Pietro Michiardi, Maurizio Filippone

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extensively showcase the scalability and performance of our proposal on several datasets with up to 8 million observations, and various DGP architectures with up to 30 hidden layers. We extensively demonstrate the effectiveness of our proposal on a variety of regression and classification problems by comparing it with DNNs and other state-of-the-art approaches to infer DGPs. We evaluate our model by comparing it against relevant alternatives for both regression and classification, and assess its performance when applied to large-scale datasets.
Researcher Affiliation Academia 1Department of Data Science, EURECOM, France 2School of Computer Science and Engineering, University of New South Wales, Australia.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper mentions that a competitor's code is available ('Code obtained from: github.com/thangbui/deepGP_approxEP') but does not provide an explicit statement or link for the source code of its own proposed method.
Open Datasets Yes We use the same experimental set-up for both regression and classification tasks using datasets from the UCI repository (Asuncion & Newman, 2007). We focus part of the experiments on large-scale problems, such as MNIST8M digit classification and the AIRLINE dataset.
Dataset Splits No The paper mentions 'withheld test data' and that 'The results are averaged over 3 folds for every dataset,' which implies a splitting strategy. However, it does not explicitly provide specific percentages for training, validation, and test splits, nor does it clearly define a separate validation set split needed for reproduction.
Hardware Specification Yes The experiments were launched on single nodes of a cluster of Intel Xeon E5-2630 CPUs having 32 cores and 128GB RAM.
Software Dependencies No The paper states that the model was implemented in 'Tensor Flow (Abadi et al., 2015)' but does not specify a version number for TensorFlow or any other software dependencies, making it difficult to reproduce the exact software environment.
Experiment Setup Yes In the proposed DGP with an RBF kernel, we use 100 random features at every hidden layer to construct a multivariate GP with D(l) F = 3, and set the batch size to m = 200. We initially only use a single Monte Carlo sample, and halfway through the allocated optimization time, this is then increased to 100 samples. We employ the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 0.01, and in order to stabilize the optimization procedure, we fix the parameters Θ for 12, 000 iterations, before jointly optimizing all parameters. For DGP-RBF and DGP-ARC, we use 500 random features, 50 GPs in the hidden layers, batch size of 1000, and Adam with a 0.001 learning rate. We construct a DNN configured with a dropout rate of 0.5 at each hidden layer in order to provide regularization during training.