Scaling Gaussian Processes with Derivative Information Using Variational Inference
Authors: Misha Padidar, Xinran Zhu, Leo Huang, Jacob Gardner, David Bindel
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the full scalability of our approach on a variety of tasks, ranging from a high dimensional stellarator fusion regression task to training graph convolutional neural networks on Pubmed using Bayesian optimization. |
| Researcher Affiliation | Academia | Misha Padidar 1, Xinran Zhu1 Leo Huang1, Jacob R. Gardner2, David Bindel1 1Cornell University, (map454, xz584, ah839, bindel)@cornell.edu 2University of Pennsylvania, jacobrg@seas.upenn.edu |
| Pseudocode | No | The paper describes algorithms and mathematical formulations but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/mishapadidar/GP-Derivatives-Variational-Inference. |
| Open Datasets | Yes | Styblinksi-Tang (2D) and Hartmann (6D) from [32], a modified 20D Welch test function [1], and a 5D sinusoid f(x) = sin(2π||x||2) (Sin-5). [...] training a graph convolutional neural network [16] on the Pubmed citation dataset [28]. [...] regression on N = 500000 function and gradient observations gathered from a D = 45 dimensional optimization objective function through the FOCUS code [39]. [...] UCI benchmark regression datasets [4]: Elevators (D=18 N=16599), Kin40k (D=8, N=40000), Energy (D=8, N=768), Protein (D=9, N=45730), Kegg-Directed (D=20, N=53414). |
| Dataset Splits | Yes | We use an 80-20 train-test split for all experiments, and train for 800 epochs. |
| Hardware Specification | No | The paper states models were "accelerated through GPy Torch [8] on a single GPU," but does not provide specific model numbers or types for the GPU, or any other hardware components. |
| Software Dependencies | No | The paper mentions "GPy Torch [8]" but does not specify a version number for this or any other software dependency. |
| Experiment Setup | Yes | The inducing matrix size is 800 for all variational inducing point methods, while DSKI is trained on 800 inducing points per dimension. [...] using an Adam optimizer with a multi-step learning rate scheduler and 1000 epochs. [...] We use an 80-20 train-test split for all experiments, and train for 800 epochs. |