Task-Agnostic Amortized Inference of Gaussian Process Hyperparameters
Authors: Sulin Liu, Xingyuan Sun, Peter J. Ramadge, Ryan P. Adams
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To empirically evaluate the AHGP, we studied three different GP use cases: regression, Bayesian optimization and Bayesian quadrature. Baselines. We compare our method to the standard approach of maximizing the log marginal likelihood with respect to hyperparameters. We also compare with the sparse variational Gaussian processes method (SGPR) [67, 28], which uses inducing points to approximate the full GP. The focus of the comparisons will be on the quality of the selected kernel hyperparameters and the run time of the hyperparameter selection procedure. |
| Researcher Affiliation | Academia | Princeton University {sulinl, xs5, ramadge, rpa}@princeton.edu |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | The code and pre-trained model are available at https: //github.com/Princeton LIPS/AHGP. |
| Open Datasets | Yes | We evaluate our method and the baselines on regression benchmarks from the UCI collection [1] used in Hernández-Lobato and Adams [29] and Sun et al. [66] following the same setup |
| Dataset Splits | Yes | We evaluate our method and the baselines on regression benchmarks from the UCI collection [1] used in Hernández-Lobato and Adams [29] and Sun et al. [66] following the same setup: the data are randomly split to 90% for training and 10% for testing. |
| Hardware Specification | No | The paper mentions using "CPU" for GPy and "GPU" for GPy Torch and PyTorch, but does not provide specific models or detailed hardware specifications (e.g., "NVIDIA A100", "Intel Core i7"). |
| Software Dependencies | No | The paper mentions software like GPy, GPy Torch, PyTorch, L-BFGS, and Adam, but does not provide specific version numbers for any of these dependencies. |
| Experiment Setup | Yes | A single neural model is trained on the synthetic data using Adam [32] with a fixed learning rate, and the same trained model is then used across all evaluations. To validate the effectiveness of our neural network model, we minimize the efforts of hyperparameter tuning during training. The only hyperparameters we tuned are learning rate and number of layers in Local Transformer and Global Transformer. Average pooling is used as the Aggregate Function. Details about the hyperparameters used are included in Appendix C. During evaluation and training, both the data input and output are standardized and the noise variance of GP is fixed at 0.01. |