Variational Gaussian Processes with Decoupled Conditionals

Authors: Xinran Zhu, Kaiwen Wu, Natalie Maus, Jacob Gardner, David Bindel

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we find this additional flexibility leads to improved model performance on a variety of regression tasks and Bayesian optimization (BO) applications. We evaluate the performance of decoupled models proposed in Sec. 3.2: DCSVGP (variational GPs using decoupled lengthscales) and SVGP-DCDKL (variational GPs with deep kernel learning using decoupled deep feature extractors.
Researcher Affiliation Academia 1Cornell University 2University of Pennsylvania {xz584,bindel}@cornell.edu {kaiwenwu,nmaus,jacobrg}@seas.upenn.edu
Pseudocode No The paper describes methods and derivations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/xinranzhu/Variational-GP-Decoupled-Conditionals.
Open Datasets Yes We consider 10 UCI regression datasets [10] with up to 386508 training examples and up to 380 dimensions. [10] D. Dua and C. Graff. UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science, 2017. URL http://archive.ics.uci.edu/ ml.
Dataset Splits Yes Results are averaged over 10 random train/validation/test splits. We train for 300 epochs using training batch size 1024. We selected the best training hyperparameters for SVGP and use the same ones for all models learning rate lr = 5e-3 and a multistep learning rate scheduler (multiplicative factor γ = 0.2).
Hardware Specification No All experiments use an RBF kernel and a zero prior mean and are accelerated through GPy Torch [14] on a single GPU. This mentions 'a single GPU' but does not specify the make or model of the GPU.
Software Dependencies No All experiments use an RBF kernel and a zero prior mean and are accelerated through GPy Torch [14] on a single GPU. We use the Adam [25] optimizer... These specify software packages but do not include version numbers.
Experiment Setup Yes We use the Adam [25] optimizer with a multistep scheduler to train all models on all datasets, and we train for 300 epochs using training batch size 1024. We selected the best training hyperparameters for SVGP and use the same ones for all models learning rate lr = 5e-3 and a multistep learning rate scheduler (multiplicative factor γ = 0.2).