Leveraging Locality and Robustness to Achieve Massively Scalable Gaussian Process Regression

Authors: Robert Allison, Anthony Stephenson, Samuel F, Edward O Pyzer-Knapp

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We exhibit a very simple GPnn regression algorithm with stand-out performance compared to other state-of-the-art GP approximations as measured on large UCI datasets.
Researcher Affiliation Collaboration Robert Allison Department of Mathematics Bristol University marfa@bristol.ac.uk Anthony Stephenson Department of Mathematics Bristol University Samuel F Alan Turing Institute Edward Pyzer-Knapp IBM Research
Pseudocode Yes Algorithm 1 Simulation of GPnn Robustness and Limiting Behaviour
Open Source Code Yes Implementational Details1: Comparisons are made between our method and the state-of-the-art approaches of SVGP [14] and five distributed methods ([15, 2, 33, 7] and [18] following the recommendation in [4]). We have chosen not to include other highly-performant approximations 1https://github.com/ant-stephenson/gpnn-experiments/
Open Datasets Yes We consider a variety of datasets from the standard UCI machine learning repository2.
Dataset Splits No Runs were made on three randomly selected 7/9, 2/9 splits into training and test sets.
Hardware Specification Yes SVGP was run on a single Tesla V100 GPU with 16GB memory; all distributed methods run on eight Intel Xeon Platinum 8000 CPUs sharing 32GB of memory. Our method was run on a Macbook Pro with 2.4 GHz Intel core i5.
Software Dependencies No The paper mentions software like "scikit-learn Nearest Neighbors package" and "GPy Torch" but does not provide specific version numbers for these dependencies.
Experiment Setup Yes For Table 1 we set e = 3000, s = 300, m = 10. We set the number of nearest neighbours to be m = 400 for all usages in this paper... SVGP used 1024 inducing points; the distributed methods all used randomly selected subsets of sizes as close as possible to 625. The learning rate for the Adam optimiser was 0.01 for SVGP and 0.1 for our method and distributed methods.