Variational nearest neighbor Gaussian process
Authors: Luhuan Wu, Geoff Pleiss, John P Cunningham
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare VNNGP to other scalable GPs through various experiments, and demonstrate that VNNGP (1) can dramatically outperform low-rank methods, and (2) is less prone to overfitting than other nearest neighbor methods. |
| Researcher Affiliation | Academia | 1Department of Statistics, Columbia University 2Zuckerman Institute, Columbia University. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | VNNGP is implemented in the GPy Torch library. See the example in https://docs.gpytorch.ai/en/stable/ examples/04_Variational_and_Approximate_ GPs/VNNGP.html |
| Open Datasets | Yes | We choose two datasets, Elevators with dimension D = 16 (Asuncion & Newman, 2007), and UKHousing (https://landregistry.data.gov.uk/)... We consider a wide range of high dimensional and spatiotemporal datasets from the UCI repository (Asuncion & Newman, 2007). In addition we include three spatial datasdets, UKHousing as mentioned in Section 4.1, Precipitation (a monthly precipitation dataset with D = 3) (Lyon, 2004; Lyon & Barnston, 2005) and Covtype (a tree cover dataset with D = 54) (Blackard & Dean, 1999). |
| Dataset Splits | Yes | Each dataset is randomly split to 64% training, 16% validation and 20% testing sets. |
| Hardware Specification | Yes | For example, for medium-sized datasets, e.g. Protein (N = 25.6K, D = 9), it takes no more than 30 seconds to build up nearest neighbor structures with K = 256 and M = N on an NVIDIA RTX2080 gpu |
| Software Dependencies | No | The paper mentions that VNNGP is implemented in 'GPy Torch library' but does not specify a version number for GPyTorch or any other software dependencies. |
| Experiment Setup | Yes | For all methods, we use an Adam optimizer and a Multi Step LR scheduler dropping the learning rate by a factor of 10 at the 75% and 90% of the optimization iterations; all kernels are Matern 5/2 kernel with separate lengthscale per dimension; the kernel lengthscales, outputscale and likelihood noise parameter (if any) are all initialized as 0.6931... All methods are trained with {300, 500} iterations and learning rate of {0.005, 0.01} for datasets of size below 50K, and with {100, 300} iterations and learning rate of {0.005, 0.001} for above 50K. |