Variational nearest neighbor Gaussian process

Authors: Luhuan Wu, Geoff Pleiss, John P Cunningham

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare VNNGP to other scalable GPs through various experiments, and demonstrate that VNNGP (1) can dramatically outperform low-rank methods, and (2) is less prone to overfitting than other nearest neighbor methods.
Researcher Affiliation Academia 1Department of Statistics, Columbia University 2Zuckerman Institute, Columbia University.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes VNNGP is implemented in the GPy Torch library. See the example in https://docs.gpytorch.ai/en/stable/ examples/04_Variational_and_Approximate_ GPs/VNNGP.html
Open Datasets Yes We choose two datasets, Elevators with dimension D = 16 (Asuncion & Newman, 2007), and UKHousing (https://landregistry.data.gov.uk/)... We consider a wide range of high dimensional and spatiotemporal datasets from the UCI repository (Asuncion & Newman, 2007). In addition we include three spatial datasdets, UKHousing as mentioned in Section 4.1, Precipitation (a monthly precipitation dataset with D = 3) (Lyon, 2004; Lyon & Barnston, 2005) and Covtype (a tree cover dataset with D = 54) (Blackard & Dean, 1999).
Dataset Splits Yes Each dataset is randomly split to 64% training, 16% validation and 20% testing sets.
Hardware Specification Yes For example, for medium-sized datasets, e.g. Protein (N = 25.6K, D = 9), it takes no more than 30 seconds to build up nearest neighbor structures with K = 256 and M = N on an NVIDIA RTX2080 gpu
Software Dependencies No The paper mentions that VNNGP is implemented in 'GPy Torch library' but does not specify a version number for GPyTorch or any other software dependencies.
Experiment Setup Yes For all methods, we use an Adam optimizer and a Multi Step LR scheduler dropping the learning rate by a factor of 10 at the 75% and 90% of the optimization iterations; all kernels are Matern 5/2 kernel with separate lengthscale per dimension; the kernel lengthscales, outputscale and likelihood noise parameter (if any) are all initialized as 0.6931... All methods are trained with {300, 500} iterations and learning rate of {0.005, 0.01} for datasets of size below 50K, and with {100, 300} iterations and learning rate of {0.005, 0.001} for above 50K.