Sampling from Gaussian Process Posteriors using Stochastic Gradient Descent
Authors: Jihao Andreas Lin, Javier Antorán, Shreyas Padhy, David Janz, José Miguel Hernández-Lobato, Alexander Terenin
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally, stochastic gradient descent achieves state-of-the-art performance on sufficiently large-scale or ill-conditioned regression tasks. |
| Researcher Affiliation | Academia | 1University of Cambridge 2Max Planck Institute for Intelligent Systems 3University of Alberta 4Cornell University |
| Pseudocode | No | The paper describes methods mathematically and textually, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code available at: HTTPS://GITHUB.COM/CAMBRIDGE-MLG/SGD-GP. |
| Open Datasets | Yes | We consider 9 datasets from the UCI repository [16] ranging in size from N = 15k to N 2M datapoints |
| Dataset Splits | No | We report mean and standard deviation over five 90%-train 10%-test splits for the small and medium datasets, and three splits for the largest dataset. No explicit validation split percentage is provided. |
| Hardware Specification | Yes | on an RTX 2070 GPU, on an A100 GPU, on a single core of a TPUv2 device |
| Software Dependencies | No | The paper mentions software like 'JAX Sci Py module', 'GPJax', 'optax.clip_by_global_norm', and 'ANNOY', but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | For all regression experiments we use a learning rate of 0.5 to estimate the mean function representer weights, and a learning rate of 0.1 to draw samples. For Thompson sampling, we use a learning rate of 0.3 for the mean and 0.0003 for the samples. In both settings, we perform gradient clipping using optax.clip_by_global_norm with max_norm set to 0.1. |