Dual Parameterization of Sparse Variational Gaussian Processes
Authors: Vincent ADAM, Paul Chang, Mohammad Emtiyaz Khan, Arno Solin
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Empirical Evaluation We conduct experiments to highlight the advantages of using the dual parameterization. Firstly, we study the effects of the improved objective for hyperparameter learning of t-SVGP versus q-SVGP. We study the objective being optimized for a single M-step, after an E-step ran until convergence. We then show a full sequence of EM iterations on small data sets. For large-scale data, where running steps to convergence is expensive, we use partial E and M-steps and mini-batching. Our improved bound and faster natural gradient computations show benefits in both settings. |
| Researcher Affiliation | Collaboration | Vincent Adam Aalto University / Secondmind.ai Espoo, Finland / Cambridge, UK vincent.adam@aalto.fi Paul E. Chang Aalto University Espoo, Finland paul.chang@aalto.fi Mohammad Emtiyaz Khan RIKEN Center for AI Project Tokyo, Japan emtiyaz.khan@riken.jp Arno Solin Aalto University Espoo, Finland arno.solin@aalto.fi |
| Pseudocode | Yes | The full algorithm is given in App. E. |
| Open Source Code | Yes | We provide a reference implementation of our method under the GPflow framework at https: //github.com/Aalto ML/t-SVGP. |
| Open Datasets | Yes | MNIST ([23], available under CC BY-SA 3.0), We use common small and mid-sized UCI data sets to test the performance of our method |
| Dataset Splits | Yes | We perform 5-fold cross validation with the results in Fig. 3 showing the mean of the folds for ELBO and NLPD. |
| Hardware Specification | Yes | We compare wall-clock time to compute 150 steps of the algorithm for both methods in terms of NLPD and ELBO taking single E and M-steps (Mac Book pro, 2 GHz CPU, 16 GB RAM). |
| Software Dependencies | Yes | We compare against the state-of-the-art implementation of SVGP in GPflow ([26], v2.2.1) |
| Experiment Setup | Yes | All experiments are performed with a batch size of nb = 200 and m = 100 inducing points and the optimization is ran until convergence using the Adam optimizer for the hyperparameters (M-step)., Table 1: NLPD on MNIST benchmarks for different learning rates and E and M steps. |