Sparse within Sparse Gaussian Processes using Neighbor Information
Authors: Gia-Lac Tran, Dimitrios Milios, Pietro Michiardi, Maurizio Filippone
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform an extensive experimental validation that demonstrates the effectiveness of our approach compared to the state-of-the-art. Our approach enables the possibility to use sparse GPs using a large number of inducing points without incurring a prohibitive computational cost. We extensively validate these properties on a variety of regression and classification tasks. We also showcase SWSGP on a large scale classification problem with M = 100, 000; we are not aware of other approaches that can handle such a large set of inducing inputs without imposing some special structure on them (e.g., grid) or without considering one-dimensional inputs. We conduct experiments to evaluate SWSGP on a variety of experimental conditions. |
| Researcher Affiliation | Academia | 1Department of Data Science, Eurecom, France 2Department of Computer Science, National University of Singapore, Singapore. |
| Pseudocode | Yes | Algorithm 1 Sparse within sparse GP (SWSGP) |
| Open Source Code | No | The paper does not provide any statement or link indicating the release of open-source code for the described methodology. |
| Open Datasets | Yes | The comparison is carried out on some UCI data sets for regression and classification, i.e., POWERPLANT, KIN, PROTEIN, EEG, CREDIT and SPAM. We also consider larger scale data sets, such as, WAVE, QUERY and the AIRLINE data or images classification on MNIST. The task for the AIRLINE data set is the classification of whether flights are to subject to a delay, and we follow the same setup as in Hensman et al. (2013) and Wilson et al. (2016). |
| Dataset Splits | Yes | The results are averaged over 10 folds. All models are trained using the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 0.001 and a mini-batch size of 64. |
| Hardware Specification | No | The paper mentions: 'We have attempted to run SVGP with such large values of M without success (out of memory in a system with 32GB of RAM).' This refers to a general RAM size but no specific CPU or GPU models or types are provided for the hardware used to conduct the reported experiments. |
| Software Dependencies | No | The paper mentions: 'All models are trained using the Adam optimizer (Kingma & Ba, 2015).' and 'We use the Mat ern-5/2 kernel in all cases except for the AIRLINE dataset, where the sum of a Mat ern-3/2 and a linear kernel is used, similar to Hensman et al. (2015).' While these are software components/algorithms, specific version numbers for libraries or frameworks (e.g., PyTorch, TensorFlow) are not provided. |
| Experiment Setup | Yes | All models are trained using the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 0.001 and a mini-batch size of 64. The likelihood for regression and binary classification are set to Gaussian and probit function, respectively. |