Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Turbocharging Gaussian Process Inference with Approximate Sketch-and-Project

Authors: Pratik Rathore, Zachary Frangella, Sachin Garg, Shaghayegh Fazliani, Michal Derezinski, Madeleine Udell

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental ADASAP outperforms state-of-the-art solvers based on conjugate gradient and coordinate descent across several benchmark datasets and a large-scale Bayesian optimization task. Moreover, ADASAP scales to a dataset with > 3 108 samples, a feat which has not been accomplished in the literature. We empirically verify that ADASAP, with its default hyperparameters, outperforms state-of-the-art competitors on benchmark large-scale GP inference tasks, and is capable of scaling to a dataset with n > 3 108 samples.
Researcher Affiliation Academia Pratik Rathore Stanford University EMAIL Zachary Frangella Stanford University EMAIL Sachin Garg University of Michigan EMAIL Shaghayegh Fazliani Stanford University EMAIL Michał Derezi nski University of Michigan EMAIL Madeleine Udell Stanford University EMAIL
Pseudocode Yes Algorithm 1 SAP for KλW = Y ... Algorithm 2 ADASAP for KλW = Y ... Algorithm 3 Col Dist Mat Mat ... Algorithm 4 Row Dist Mat Mat ... Algorithm 5 Rand Nys Appx ... Algorithm 6 Get Stepsize ... Algorithm 7 Nest Acc
Open Source Code Yes Code for reproducing our experiments is available at https://github.com/pratikrathore8/scalable_gp_inference.
Open Datasets Yes We benchmark on six large-scale regression datasets from the UCI repository, Open ML, and s GDML [Chmiela et al., 2017]. The results are reported in Table 1. ... To demonstrate the power of ADASAP on huge-scale problems, we perform GP inference on a subset of the taxi dataset (https://github.com/toddwschneider/nyc-taxi-data) with n = 3.31 108 samples and dimension d = 9: the task is to predict taxi ride durations in New York City.
Dataset Splits Yes The results are averaged over five 90%-10% train-test splits of each dataset. ... Due to computational constraints, we use a single 99%-1% train-test split, and run each method for a single pass through the dataset.
Hardware Specification Yes Our experiments are run in single precision on 48 GB NVIDIA RTX A6000 GPUs using Python 3.10, Py Torch 2.6.0 [Paszke et al., 2019], and CUDA 12.5. We use 2, 3, and 1 GPU(s) per experiment in Sections 5.1 to 5.3, respectively.
Software Dependencies Yes Our experiments are run in single precision on 48 GB NVIDIA RTX A6000 GPUs using Python 3.10, Py Torch 2.6.0 [Paszke et al., 2019], and CUDA 12.5.
Experiment Setup Yes We use a zero-mean prior for all datasets. We train the kernel variance, likelihood variance, and lengthscale (we use a separate lengthscale for each dimension of X) using the procedure of Lin et al. [2023], which we restate for completeness: 1. Select a centroid point from the training data X uniformly at random. 2. Select the 10,000 points in the training data that are closest to the centroid in Euclidean norm. 3. Find hyperparameters by maximizing the exact GP likelihood over this subset of training points. 4. Repeat the previous three steps for 10 centroids and average the resulting hyperparameters. ... For GP inference on large-scale datasets, we use blocksize b = n/100 in ADASAP, ADASAP-I, and SDD; blocksize b = n/2,000 for transporation data analysis, and b = n/5 for Bayesian optimization, We set the rank r = 100 for both ADASAP and PCG. Similar to Lin et al. [2024], we set the stepsize in SDD to be one of {1/n, 10/n, 100/n} (this grid corresponds to SDD-1, SDD-10, and SDD-100), the momentum to 0.9, and the averaging parameter to 100/Tmax.