Dual Parameterization of Sparse Variational Gaussian Processes

Authors: Vincent ADAM, Paul Chang, Mohammad Emtiyaz Khan, Arno Solin

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Empirical Evaluation We conduct experiments to highlight the advantages of using the dual parameterization. Firstly, we study the effects of the improved objective for hyperparameter learning of t-SVGP versus q-SVGP. We study the objective being optimized for a single M-step, after an E-step ran until convergence. We then show a full sequence of EM iterations on small data sets. For large-scale data, where running steps to convergence is expensive, we use partial E and M-steps and mini-batching. Our improved bound and faster natural gradient computations show benefits in both settings.
Researcher Affiliation Collaboration Vincent Adam Aalto University / Secondmind.ai Espoo, Finland / Cambridge, UK vincent.adam@aalto.fi Paul E. Chang Aalto University Espoo, Finland paul.chang@aalto.fi Mohammad Emtiyaz Khan RIKEN Center for AI Project Tokyo, Japan emtiyaz.khan@riken.jp Arno Solin Aalto University Espoo, Finland arno.solin@aalto.fi
Pseudocode Yes The full algorithm is given in App. E.
Open Source Code Yes We provide a reference implementation of our method under the GPflow framework at https: //github.com/Aalto ML/t-SVGP.
Open Datasets Yes MNIST ([23], available under CC BY-SA 3.0), We use common small and mid-sized UCI data sets to test the performance of our method
Dataset Splits Yes We perform 5-fold cross validation with the results in Fig. 3 showing the mean of the folds for ELBO and NLPD.
Hardware Specification Yes We compare wall-clock time to compute 150 steps of the algorithm for both methods in terms of NLPD and ELBO taking single E and M-steps (Mac Book pro, 2 GHz CPU, 16 GB RAM).
Software Dependencies Yes We compare against the state-of-the-art implementation of SVGP in GPflow ([26], v2.2.1)
Experiment Setup Yes All experiments are performed with a batch size of nb = 200 and m = 100 inducing points and the optimization is ran until convergence using the Adam optimizer for the hyperparameters (M-step)., Table 1: NLPD on MNIST benchmarks for different learning rates and E and M steps.