Bezier Gaussian Processes for Tall and Wide Data

Authors: Martin Jørgensen, Michael A Osborne

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We split our evaluation into four parts. First, we visually inspect the posterior on a one dimensional toy dataset to show how the control points behave, and indicate that there indeed is a stationarylike behaviour on the domain of the hypercube. Next, we test empirically on some standard UCI Benchmark datasets, to gives insight into when Bézier GPs are applicable. After that, we switch to tall and wide data large both in the input dimension and in number of data points. These experiments give certainty that the method delivers on its key promise: scalability. Lastly, we turn our eyes to the method itself and investigate how performance is influenced by the ordering of dimensions.
Researcher Affiliation Academia Martin Jørgensen Department of Engineering Science University of Oxford martinj@robots.ox.ac.uk Michael A. Osborne Department of Engineering Science University of Oxford mosb@robots.ox.ac.uk
Pseudocode No The paper describes the steps and calculations of the Bézier buttress but does not include formal pseudocode blocks or algorithms.
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] and Code is in supplements.
Open Datasets Yes We evaluate on eight small to mid-size real world datasets commonly used to benchmark regression [Hernandez-Lobato and Adams, 2015]. and We generate one-dimensional inputs uniformly in the regions [0, 0.33] and [0.66, 1]; we sample 20 observation in each region.
Dataset Splits Yes We split each dataset into train/test-split with the ratio 90/10. We do this over 20 random splits and report test set RMSE and log-likelihood average and standard deviation over splits. and If the latter, we use the validation-split they use as training data.
Hardware Specification No Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [No] Our method ran on a laptop for all but one experiment. GPUs used for some baselines.
Software Dependencies No Both phases use the Adam optimiser [Kingma and Ba, 2015]. This is a specific optimizer, but no version for the overall software stack.
Experiment Setup Yes We split optimisation into two phases. First, we optimise all variational parameters, keeping the likelihood variance σ2 fixed as τ 1, with τ being the number of control points. After this initial phase, we optimise σ2 with all variational parameters fixed. We let both phases run for 10000 iterations with a mini-batch size of 500, for all datasets. Both phases use the Adam optimiser [Kingma and Ba, 2015], the first phase with learning rate 0.001, and the second with learning rate 0.01.