Exact Gaussian Processes on a Million Data Points
Authors: Ke Wang, Geoff Pleiss, Jacob Gardner, Stephen Tyree, Kilian Q. Weinberger, Andrew Gordon Wilson
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We benchmark on regression datasets from the UCI repository [1]. We find exact GPs offer notably better performance on these datasets, often exceeding a two-fold reduction in root-mean-squared error. The results show how non-parametric representations continue to significantly benefit from the addition of new training points, a valuable conceptual finding in favor of non-parametric approaches. These results clarify the relative performance of popular GP approximations against exact GPs in an unexplored data size regime and enable future comparisons against other GP approximations. |
| Researcher Affiliation | Collaboration | 1Cornell University, 2Uber AI Labs, 3NVIDIA, 4New York University |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures). |
| Open Source Code | Yes | Code to reproduce the experiments is available at https://gpytorch.ai. |
| Open Datasets | Yes | We benchmark on regression datasets from the UCI repository [1]. |
| Dataset Splits | Yes | Each dataset is randomly split into 4/9 training, 2/9 validating, and 3/9 testing sets. We use the validation set for tuning parameters like the CG training tolerance. |
| Hardware Specification | Yes | We perform all training on a single machine with 8 NVIDIA Tesla V100-SXM2-32GB-LS GPUs. All predictions are made on one NVIDIA RTX 2080 Ti GPU. |
| Software Dependencies | No | The paper states 'We extend the GPy Torch library [11] to perform all experiments.' but does not provide specific version numbers for GPyTorch or any other software dependencies like PyTorch or CUDA, which are necessary for full reproducibility. |
| Experiment Setup | Yes | For SGPR, we perform 100 iterations of Adam with a learning rate of 0.1. For SVGP, we perform 100 epochs of Adam with a minibatch size of 1,024 and a learning rate of 0.01, which we found to perform better than 0.1. For exact GPs, the number of optimization steps has the greatest effect on the training time for large datasets. To reduce the training time for exact GPs, we first randomly subset 10,000 training points from the full training set to fit an exact GP whose hyperparameters will be used as initialization. We pretrain on this subset with 10 steps of L-BFGS [21] and 10 steps of Adam [19] with 0.1 step size before using the learned hyperaparameters to take 3 steps of Adam on the full training dataset. For all experiments, we use a rank-100 partial pivoted-Cholesky preconditioner and run PCG with a tolerance of ϵ = 1 during training. We constrain the learned noise to be at least 0.1 to regularize the poorly conditioned kernel matrix for the houseelectric dataset. |