Scaling Gaussian Processes with Derivative Information Using Variational Inference

Authors: Misha Padidar, Xinran Zhu, Leo Huang, Jacob Gardner, David Bindel

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the full scalability of our approach on a variety of tasks, ranging from a high dimensional stellarator fusion regression task to training graph convolutional neural networks on Pubmed using Bayesian optimization.
Researcher Affiliation Academia Misha Padidar 1, Xinran Zhu1 Leo Huang1, Jacob R. Gardner2, David Bindel1 1Cornell University, (map454, xz584, ah839, bindel)@cornell.edu 2University of Pennsylvania, jacobrg@seas.upenn.edu
Pseudocode No The paper describes algorithms and mathematical formulations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/mishapadidar/GP-Derivatives-Variational-Inference.
Open Datasets Yes Styblinksi-Tang (2D) and Hartmann (6D) from [32], a modified 20D Welch test function [1], and a 5D sinusoid f(x) = sin(2π||x||2) (Sin-5). [...] training a graph convolutional neural network [16] on the Pubmed citation dataset [28]. [...] regression on N = 500000 function and gradient observations gathered from a D = 45 dimensional optimization objective function through the FOCUS code [39]. [...] UCI benchmark regression datasets [4]: Elevators (D=18 N=16599), Kin40k (D=8, N=40000), Energy (D=8, N=768), Protein (D=9, N=45730), Kegg-Directed (D=20, N=53414).
Dataset Splits Yes We use an 80-20 train-test split for all experiments, and train for 800 epochs.
Hardware Specification No The paper states models were "accelerated through GPy Torch [8] on a single GPU," but does not provide specific model numbers or types for the GPU, or any other hardware components.
Software Dependencies No The paper mentions "GPy Torch [8]" but does not specify a version number for this or any other software dependency.
Experiment Setup Yes The inducing matrix size is 800 for all variational inducing point methods, while DSKI is trained on 800 inducing points per dimension. [...] using an Adam optimizer with a multi-step learning rate scheduler and 1000 epochs. [...] We use an 80-20 train-test split for all experiments, and train for 800 epochs.