Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Variational nearest neighbor Gaussian process
Authors: Luhuan Wu, Geoff Pleiss, John P Cunningham
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare VNNGP to other scalable GPs through various experiments, and demonstrate that VNNGP (1) can dramatically outperform low-rank methods, and (2) is less prone to overfitting than other nearest neighbor methods. |
| Researcher Affiliation | Academia | 1Department of Statistics, Columbia University 2Zuckerman Institute, Columbia University. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | VNNGP is implemented in the GPy Torch library. See the example in https://docs.gpytorch.ai/en/stable/ examples/04_Variational_and_Approximate_ GPs/VNNGP.html |
| Open Datasets | Yes | We choose two datasets, Elevators with dimension D = 16 (Asuncion & Newman, 2007), and UKHousing (https://landregistry.data.gov.uk/)... We consider a wide range of high dimensional and spatiotemporal datasets from the UCI repository (Asuncion & Newman, 2007). In addition we include three spatial datasdets, UKHousing as mentioned in Section 4.1, Precipitation (a monthly precipitation dataset with D = 3) (Lyon, 2004; Lyon & Barnston, 2005) and Covtype (a tree cover dataset with D = 54) (Blackard & Dean, 1999). |
| Dataset Splits | Yes | Each dataset is randomly split to 64% training, 16% validation and 20% testing sets. |
| Hardware Specification | Yes | For example, for medium-sized datasets, e.g. Protein (N = 25.6K, D = 9), it takes no more than 30 seconds to build up nearest neighbor structures with K = 256 and M = N on an NVIDIA RTX2080 gpu |
| Software Dependencies | No | The paper mentions that VNNGP is implemented in 'GPy Torch library' but does not specify a version number for GPyTorch or any other software dependencies. |
| Experiment Setup | Yes | For all methods, we use an Adam optimizer and a Multi Step LR scheduler dropping the learning rate by a factor of 10 at the 75% and 90% of the optimization iterations; all kernels are Matern 5/2 kernel with separate lengthscale per dimension; the kernel lengthscales, outputscale and likelihood noise parameter (if any) are all initialized as 0.6931... All methods are trained with {300, 500} iterations and learning rate of {0.005, 0.01} for datasets of size below 50K, and with {100, 300} iterations and learning rate of {0.005, 0.001} for above 50K. |