Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Turbocharging Gaussian Process Inference with Approximate Sketch-and-Project

Authors: Pratik Rathore, Zachary Frangella, Sachin Garg, Shaghayegh Fazliani, Michal Derezinski, Madeleine Udell

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	ADASAP outperforms state-of-the-art solvers based on conjugate gradient and coordinate descent across several benchmark datasets and a large-scale Bayesian optimization task. Moreover, ADASAP scales to a dataset with > 3 108 samples, a feat which has not been accomplished in the literature. We empirically verify that ADASAP, with its default hyperparameters, outperforms state-of-the-art competitors on benchmark large-scale GP inference tasks, and is capable of scaling to a dataset with n > 3 108 samples.
Researcher Affiliation	Academia	Pratik Rathore Stanford University EMAIL Zachary Frangella Stanford University EMAIL Sachin Garg University of Michigan EMAIL Shaghayegh Fazliani Stanford University EMAIL Michał Derezi nski University of Michigan EMAIL Madeleine Udell Stanford University EMAIL
Pseudocode	Yes	Algorithm 1 SAP for KλW = Y ... Algorithm 2 ADASAP for KλW = Y ... Algorithm 3 Col Dist Mat Mat ... Algorithm 4 Row Dist Mat Mat ... Algorithm 5 Rand Nys Appx ... Algorithm 6 Get Stepsize ... Algorithm 7 Nest Acc
Open Source Code	Yes	Code for reproducing our experiments is available at https://github.com/pratikrathore8/scalable_gp_inference.
Open Datasets	Yes	We benchmark on six large-scale regression datasets from the UCI repository, Open ML, and s GDML [Chmiela et al., 2017]. The results are reported in Table 1. ... To demonstrate the power of ADASAP on huge-scale problems, we perform GP inference on a subset of the taxi dataset (https://github.com/toddwschneider/nyc-taxi-data) with n = 3.31 108 samples and dimension d = 9: the task is to predict taxi ride durations in New York City.
Dataset Splits	Yes	The results are averaged over five 90%-10% train-test splits of each dataset. ... Due to computational constraints, we use a single 99%-1% train-test split, and run each method for a single pass through the dataset.
Hardware Specification	Yes	Our experiments are run in single precision on 48 GB NVIDIA RTX A6000 GPUs using Python 3.10, Py Torch 2.6.0 [Paszke et al., 2019], and CUDA 12.5. We use 2, 3, and 1 GPU(s) per experiment in Sections 5.1 to 5.3, respectively.
Software Dependencies	Yes	Our experiments are run in single precision on 48 GB NVIDIA RTX A6000 GPUs using Python 3.10, Py Torch 2.6.0 [Paszke et al., 2019], and CUDA 12.5.
Experiment Setup	Yes	We use a zero-mean prior for all datasets. We train the kernel variance, likelihood variance, and lengthscale (we use a separate lengthscale for each dimension of X) using the procedure of Lin et al. [2023], which we restate for completeness: 1. Select a centroid point from the training data X uniformly at random. 2. Select the 10,000 points in the training data that are closest to the centroid in Euclidean norm. 3. Find hyperparameters by maximizing the exact GP likelihood over this subset of training points. 4. Repeat the previous three steps for 10 centroids and average the resulting hyperparameters. ... For GP inference on large-scale datasets, we use blocksize b = n/100 in ADASAP, ADASAP-I, and SDD; blocksize b = n/2,000 for transporation data analysis, and b = n/5 for Bayesian optimization, We set the rank r = 100 for both ADASAP and PCG. Similar to Lin et al. [2024], we set the stepsize in SDD to be one of {1/n, 10/n, 100/n} (this grid corresponds to SDD-1, SDD-10, and SDD-100), the momentum to 0.9, and the averaging parameter to 100/Tmax.