reproducibilityindex.ai

Exact Gaussian Processes on a Million Data Points

Authors: Ke Wang, Geoff Pleiss, Jacob Gardner, Stephen Tyree, Kilian Q. Weinberger, Andrew Gordon Wilson

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We benchmark on regression datasets from the UCI repository [1]. We ﬁnd exact GPs offer notably better performance on these datasets, often exceeding a two-fold reduction in root-mean-squared error. The results show how non-parametric representations continue to signiﬁcantly beneﬁt from the addition of new training points, a valuable conceptual ﬁnding in favor of non-parametric approaches. These results clarify the relative performance of popular GP approximations against exact GPs in an unexplored data size regime and enable future comparisons against other GP approximations.
Researcher Affiliation	Collaboration	1Cornell University, 2Uber AI Labs, 3NVIDIA, 4New York University
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures).
Open Source Code	Yes	Code to reproduce the experiments is available at https://gpytorch.ai.
Open Datasets	Yes	We benchmark on regression datasets from the UCI repository [1].
Dataset Splits	Yes	Each dataset is randomly split into 4/9 training, 2/9 validating, and 3/9 testing sets. We use the validation set for tuning parameters like the CG training tolerance.
Hardware Specification	Yes	We perform all training on a single machine with 8 NVIDIA Tesla V100-SXM2-32GB-LS GPUs. All predictions are made on one NVIDIA RTX 2080 Ti GPU.
Software Dependencies	No	The paper states 'We extend the GPy Torch library [11] to perform all experiments.' but does not provide specific version numbers for GPyTorch or any other software dependencies like PyTorch or CUDA, which are necessary for full reproducibility.
Experiment Setup	Yes	For SGPR, we perform 100 iterations of Adam with a learning rate of 0.1. For SVGP, we perform 100 epochs of Adam with a minibatch size of 1,024 and a learning rate of 0.01, which we found to perform better than 0.1. For exact GPs, the number of optimization steps has the greatest effect on the training time for large datasets. To reduce the training time for exact GPs, we ﬁrst randomly subset 10,000 training points from the full training set to ﬁt an exact GP whose hyperparameters will be used as initialization. We pretrain on this subset with 10 steps of L-BFGS [21] and 10 steps of Adam [19] with 0.1 step size before using the learned hyperaparameters to take 3 steps of Adam on the full training dataset. For all experiments, we use a rank-100 partial pivoted-Cholesky preconditioner and run PCG with a tolerance of ϵ = 1 during training. We constrain the learned noise to be at least 0.1 to regularize the poorly conditioned kernel matrix for the houseelectric dataset.