Fast Matrix Square Roots with Applications to Gaussian Processes and Bayesian Optimization
Authors: Geoff Pleiss, Martin Jankowiak, David Eriksson, Anil Damle, Jacob Gardner
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate our method s applicability on matrices as large as 50,000 50,000 well beyond traditional methods with little approximation error. Applying this increased scalability to variational Gaussian processes, Bayesian optimization, and Gibbs sampling results in more powerful models with higher accuracy. In particular, we perform variational GP inference with up to 10,000 inducing points and perform Gibbs sampling on a 25,000-dimensional problem. |
| Researcher Affiliation | Collaboration | Geoff Pleiss Columbia University gmp2162@columbia.edu Martin Jankowiak The Broad Institute mjankowi@broadinstitute.org David Eriksson Facebook deriksson@fb.com Anil Damle Cornell University damle@cornell.edu Jacob R. Gardner University of Pennsylvania jacobrg@seas.upenn.edu |
| Pseudocode | Yes | Alg. 1 (see Appendix) summarizes this approach; below we highlight its computational properties: |
| Open Source Code | Yes | Code examples for the GPy Torch framework are available at bit.ly/ciq_variational and bit.ly/ciq_sampling. ... We have release an open-sourced implementation of this algorithm to facilitate the adoption of this method. See bit.ly/ciq_variational and bit.ly/ciq_sampling. |
| Open Datasets | Yes | We compare ms MINRES-CIQ-SVGP against Cholesky-SVGP on 3 large-scale datasets: a GIS dataset (3droad, D = 2) [34], a monthly precipitation dataset (Precipitation, D = 3) [52, 53], and a tree cover dataset (Covtype, D = 54) [9]. Details on these datasets (including how to acquire them) are in Appx. F. |
| Dataset Splits | No | The paper mentions using a "test-set" for evaluation, but does not provide specific train/validation/test split percentages, sample counts, or a detailed splitting methodology. It only mentions training models with 10^3 to 10^4 inducing points. |
| Hardware Specification | Yes | Timings are performed on a NVIDIA 1070 GPU. ... ms MINRES-CIQ models are up to 5.6x faster than Cholesky models (on a Titan RTX GPU). ... using a Titan RTX GPU. |
| Software Dependencies | No | The paper mentions the "GPy Torch framework" but does not specify any software versions for libraries or other dependencies used in the experiments. |
| Experiment Setup | Yes | ms MINRES is stopped after achieving a relative residual of 10 4 or after reaching J = 400 iterations. ... Optimization typically requires up to 10,000 iterations of training [e.g. 66]. ... For 3droad we use a Gaussian observation model. The Precipitation dataset has noisier observations; therefore we apply a Student-T observation model. Finally, we reduce the Cov Type dataset to a binary classification problem and apply a Bernoulli observation model. |