Sampling from Gaussian Process Posteriors using Stochastic Gradient Descent

Authors: Jihao Andreas Lin, Javier Antorán, Shreyas Padhy, David Janz, José Miguel Hernández-Lobato, Alexander Terenin

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimentally, stochastic gradient descent achieves state-of-the-art performance on sufficiently large-scale or ill-conditioned regression tasks.
Researcher Affiliation Academia 1University of Cambridge 2Max Planck Institute for Intelligent Systems 3University of Alberta 4Cornell University
Pseudocode No The paper describes methods mathematically and textually, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code available at: HTTPS://GITHUB.COM/CAMBRIDGE-MLG/SGD-GP.
Open Datasets Yes We consider 9 datasets from the UCI repository [16] ranging in size from N = 15k to N 2M datapoints
Dataset Splits No We report mean and standard deviation over five 90%-train 10%-test splits for the small and medium datasets, and three splits for the largest dataset. No explicit validation split percentage is provided.
Hardware Specification Yes on an RTX 2070 GPU, on an A100 GPU, on a single core of a TPUv2 device
Software Dependencies No The paper mentions software like 'JAX Sci Py module', 'GPJax', 'optax.clip_by_global_norm', and 'ANNOY', but does not provide specific version numbers for these dependencies.
Experiment Setup Yes For all regression experiments we use a learning rate of 0.5 to estimate the mean function representer weights, and a learning rate of 0.1 to draw samples. For Thompson sampling, we use a learning rate of 0.3 for the mean and 0.0003 for the samples. In both settings, we perform gradient clipping using optax.clip_by_global_norm with max_norm set to 0.1.