Variational Gaussian Processes with Decoupled Conditionals
Authors: Xinran Zhu, Kaiwen Wu, Natalie Maus, Jacob Gardner, David Bindel
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we find this additional flexibility leads to improved model performance on a variety of regression tasks and Bayesian optimization (BO) applications. We evaluate the performance of decoupled models proposed in Sec. 3.2: DCSVGP (variational GPs using decoupled lengthscales) and SVGP-DCDKL (variational GPs with deep kernel learning using decoupled deep feature extractors. |
| Researcher Affiliation | Academia | 1Cornell University 2University of Pennsylvania {xz584,bindel}@cornell.edu {kaiwenwu,nmaus,jacobrg}@seas.upenn.edu |
| Pseudocode | No | The paper describes methods and derivations but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/xinranzhu/Variational-GP-Decoupled-Conditionals. |
| Open Datasets | Yes | We consider 10 UCI regression datasets [10] with up to 386508 training examples and up to 380 dimensions. [10] D. Dua and C. Graff. UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science, 2017. URL http://archive.ics.uci.edu/ ml. |
| Dataset Splits | Yes | Results are averaged over 10 random train/validation/test splits. We train for 300 epochs using training batch size 1024. We selected the best training hyperparameters for SVGP and use the same ones for all models learning rate lr = 5e-3 and a multistep learning rate scheduler (multiplicative factor γ = 0.2). |
| Hardware Specification | No | All experiments use an RBF kernel and a zero prior mean and are accelerated through GPy Torch [14] on a single GPU. This mentions 'a single GPU' but does not specify the make or model of the GPU. |
| Software Dependencies | No | All experiments use an RBF kernel and a zero prior mean and are accelerated through GPy Torch [14] on a single GPU. We use the Adam [25] optimizer... These specify software packages but do not include version numbers. |
| Experiment Setup | Yes | We use the Adam [25] optimizer with a multistep scheduler to train all models on all datasets, and we train for 300 epochs using training batch size 1024. We selected the best training hyperparameters for SVGP and use the same ones for all models learning rate lr = 5e-3 and a multistep learning rate scheduler (multiplicative factor γ = 0.2). |