Scalable Gaussian Process Regression Networks

Authors: Shibo Li, Wei Xing, Robert M. Kirby, Shandian Zhe

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental For evaluation, we first examined our method on three small datasets where the existing GPRN inference approaches are available. Our method shows not only better predictive performance but also a great speed-up. Then we tested our method in two real-world applications with thousands of outputs and the existing GPRN inference algorithms are not feasible. Compared with several state-of-the-art scalable multi-output regression methods, our method almost always achieves significantly better prediction accuracy. Finally, we applied GPRN in a large-scale physical simulation application for one million output prediction. Our method often improves upon the competing approaches by a large margin.
Researcher Affiliation Academia 1School of Computing, University of Utah 2Scientific Computing and Imaging Institute, University of Utah
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper mentions 'https://github.com/trungngv/gprn' but explicitly states it was used for 'MFVB and NPV', which are competing methods, not the authors' own implementation (SGPRN). There is no other statement providing concrete access to the authors' source code for their proposed methodology.
Open Datasets Yes Jura 1, the heavy-metal concentration measurements of 349 neighbouring locations in Swiss Jura. Following [Wilson et al., 2012], we predicted 3 correlated concentrations, cadmium, nickel and zinc, given the locations of the measurements. (2)Equity 2 [Wilson et al., 2012], a financial datasets that include 643 records of 5 equity indices NASDAQ, FTSE, TSE, NIKKEI and DJC. The inputs are the 5 indices, and the goal is to predict their 25 pair-wise correlations. (3) PM2.53, 100 spatial measurements (i.e., outputs) of the particulate matter pollution (PM2.5) in Salt Lake City in July 4-7, 2018. The inputs are time points of the measurements. (4) Cantilever [Andreassen et al., 2011], material structures with the maximum stiffness on bearing forces from the right side. The input of each example is the force and the outputs are a 3,200 dimensional vector that represents the stress field that determines the optimal material layout in a 80 40 rectangular domain. (5) Gene Exp4, expressions of 4,511 genes (outputs) measured by different microarrays, each of which is described by a 10 dimensional input vector.
Dataset Splits Yes On Jura, we randomly split the data into 249 examples for training and 100 for test, on Equity 200 for training and 200 for test, and on PM2.5 256 for training and 32 for test.
Hardware Specification Yes We tested SGRPN, MFVB and NPV on a workstation with 2 Intel(R) Xeon(R) E5-2697 CPUs, 28 cores and 196GB memory.
Software Dependencies No The paper mentions 'Tensor Flow' and 'MATLAB' but does not provide specific version numbers for any software dependencies. For example, it cites the TensorFlow paper but does not specify the version of the TensorFlow library used.
Experiment Setup Yes We used Adam [Kingma and Ba, 2014] algorithm for gradient-based optimization and the learning rate was set to 10 3. All the competing methods were implemented with MATLAB. We used RBF kernel for all the methods. The input features of all the datasets were normalized, and the kernel parameters (i.e., the length-scale) were initialized to 1. We varied the number of latent functions/features/bases from {2, 5, 15, 50}. We ran our algorithm for 2, 000 epochs to ensure convergence; both MFVB and NPV converged after 100 iterations.