Task-Agnostic Amortized Inference of Gaussian Process Hyperparameters

Authors: Sulin Liu, Xingyuan Sun, Peter J. Ramadge, Ryan P. Adams

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To empirically evaluate the AHGP, we studied three different GP use cases: regression, Bayesian optimization and Bayesian quadrature. Baselines. We compare our method to the standard approach of maximizing the log marginal likelihood with respect to hyperparameters. We also compare with the sparse variational Gaussian processes method (SGPR) [67, 28], which uses inducing points to approximate the full GP. The focus of the comparisons will be on the quality of the selected kernel hyperparameters and the run time of the hyperparameter selection procedure.
Researcher Affiliation Academia Princeton University {sulinl, xs5, ramadge, rpa}@princeton.edu
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes The code and pre-trained model are available at https: //github.com/Princeton LIPS/AHGP.
Open Datasets Yes We evaluate our method and the baselines on regression benchmarks from the UCI collection [1] used in Hernández-Lobato and Adams [29] and Sun et al. [66] following the same setup
Dataset Splits Yes We evaluate our method and the baselines on regression benchmarks from the UCI collection [1] used in Hernández-Lobato and Adams [29] and Sun et al. [66] following the same setup: the data are randomly split to 90% for training and 10% for testing.
Hardware Specification No The paper mentions using "CPU" for GPy and "GPU" for GPy Torch and PyTorch, but does not provide specific models or detailed hardware specifications (e.g., "NVIDIA A100", "Intel Core i7").
Software Dependencies No The paper mentions software like GPy, GPy Torch, PyTorch, L-BFGS, and Adam, but does not provide specific version numbers for any of these dependencies.
Experiment Setup Yes A single neural model is trained on the synthetic data using Adam [32] with a fixed learning rate, and the same trained model is then used across all evaluations. To validate the effectiveness of our neural network model, we minimize the efforts of hyperparameter tuning during training. The only hyperparameters we tuned are learning rate and number of layers in Local Transformer and Global Transformer. Average pooling is used as the Aggregate Function. Details about the hyperparameters used are included in Appendix C. During evaluation and training, both the data input and output are standardized and the noise variance of GP is fixed at 0.01.