Scalable Training of Inference Networks for Gaussian-Process Models

Authors: Jiaxin Shi, Mohammad Emtiyaz Khan, Jun Zhu

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results show comparable and, sometimes, superior performance to existing sparse variational GP methods. 5. Experiments Throughout all experiments, M denotes both the number of inducing points in SVGP and the number of measurement points in GPNet and FBNN (Sun et al., 2019).
Researcher Affiliation Academia 1Dept. of Comp. Sci. & Tech., Institute for AI, BNRist Center, THBI Lab, Tsinghua University, Beijing, China 2RIKEN Center for Advanced Intelligence project, Tokyo, Japan. Correspondence to: Jiaxin Shi <shijx15@mails.tsinghua.edu.cn>, Jun Zhu <dcszj@tsinghua.edu.cn>.
Pseudocode Yes Algorithm 1 GPNet for supervised learning
Open Source Code Yes Code is available at https: //github.com/thjashin/gp-infer-net.
Open Datasets Yes We consider the inference of a GP with RBF kernel on the synthetic dataset introduced in Snelson & Ghahramani (2006). We evaluate our method on seven standard regression benchmark datasets. We conducted experiments on the airline delay dataset, which includes 5.9 million flight records in the USA from Jan to Apr in 2018. We test GPNet on MNIST and CIFAR10 with a CNN-GP prior.
Dataset Splits Yes The regression results are averaged over 10 random splits for small datasets (n < 5000) and 3 splits for large datasets (n >= 5000). Following the protocol in Hensman et al. (2013), we randomly take 700K points for training and 100K for testing.
Hardware Specification No No specific hardware details (e.g., CPU/GPU models, memory) used for running experiments are mentioned in the paper.
Software Dependencies No Implementations are based on a customized version of GPflow (de G. Matthews et al., 2017; Sun et al., 2018) and Zhu Suan (Shi et al., 2017). No version numbers for software are provided.
Experiment Setup Yes We ran for 40K iterations and used learning rate 0.003 for all methods. For fair comparison, for all three methods we pretrain the prior hyperparameters for 100 iterations using the GP marginal likelihood and keep them fixed thereafter. We vary M in {2, 5, 20} for all methods. The networks used in GPNet and FBNN are the same RFE with 20 hidden units.