reproducibilityindex.ai

Conditioning Sparse Variational Gaussian Processes for Online Decision-making

Authors: Wesley J. Maddox, Samuel Stanton, Andrew G. Wilson

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show OVC provides compelling performance in a range of applications including active learning of malaria incidence, and reinforcement learning on Mu Jo Co simulated robotic control tasks. Our experimental evaluation demonstrates that SVGPs using OVC can be successfully used as surrogate models with advanced acquisition functions in Bayesian optimization, even in the large batch and non-Gaussian settings.
Researcher Affiliation	Academia	Wesley J. Maddox New York University wjm363@nyu.edu Samuel Stanton New York University ss13641@nyu.edu Andrew Gordon Wilson New York University andrewgw@cims.nyu.edu
Pseudocode	Yes	Algorithm 1 Online Variational Conditioning (OVC)
Open Source Code	Yes	Our code is available at https://github.com/wjmaddox/online_vargp.
Open Datasets	Yes	We consider data from the Malaria Global Atlas [81] describing the infection rate of a parasite known to cause malaria in 2017. pivoted cholesky updates perform signiﬁcantly better, as shown in Figure 2 on the UCI protein dataset [19]. Finally, we consider Mu Jo Co problems using the Open AI gym [75, 7] with LTSs inside of Tur BO with trust regions generated by Monte Carlo Tree Search following the procedure of Wang et al. [78].
Dataset Splits	No	The paper describes initial points, batch sizes, and optimization steps, but does not provide explicit training, validation, or test dataset splits with percentages or counts.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. It mentions 'GPU Acceleration' in the context of the GPy Torch library, but not the specific hardware used by the authors.
Software Dependencies	No	The paper mentions using Py Torch, GPy Torch, and Bo Torch but does not specify their version numbers.
Experiment Setup	Yes	As all components are differentiable, we use the Adam optimizer with a learning rate of 0.1 and optimize for 1000 steps or until the loss converges, whichever is shorter. We use 10 initial points and a batch size of 3 optimizing for 50 iterations. The kernel hyper-parameters are initialized to GPy Torch defaults (which sets all lengthscales to one), while the variational distribution is initialized to mu = 0, Su = I (again, GPy Torch defaults).