PACOH: Bayes-Optimal Meta-Learning with PAC-Guarantees

Authors: Jonas Rothfuss, Vincent Fortuin, Martin Josifoski, Andreas Krause

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we instantiate our framework with Gaussian Processes (GPs) and Bayesian Neural Networks (BNNs) as base learners. Across several regression and classification environments, our proposed approach achieves state-of-the-art predictive accuracy, while also improving the calibration of the uncertainty estimates.
Researcher Affiliation Academia 1ETH Zurich, Switzerland 2EPFL, Switzerland.
Pseudocode Yes Algorithm 1 PACOH with SVGD approximation of Q
Open Source Code Yes The source code for PACOH-GP is available at tinyurl.com/pacoh-gp-code. An implementation of PACOH-NN can be found at tinyurl.com/pacoh-nn-code.
Open Datasets Yes Swiss Free Electron Laser (Swiss FEL) (Milne et al., 2017; Kirschner et al., 2019b), Physio Net 2012 challenge (Silva et al., 2012), Intel Berkeley Research Lab temperature sensor dataset (Berkeley-Sensor) (Madden, 2004), Omniglot (Lake et al., 2015)
Dataset Splits No The paper mentions 30 meta-train and 20 meta-test tasks for Omniglot, and refers to 'target training' and 'target testing' in Figure 1. However, it does not provide specific percentages or counts for train/validation/test dataset splits within each task in the main text.
Hardware Specification No The paper discusses computational complexity and memory usage but does not provide specific hardware details such as GPU/CPU models, memory amounts, or detailed computer specifications used for running the experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiments.
Experiment Setup Yes we use λ = n, β = m, the negative log-likelihood as loss function and a Gaussian hyper-prior P = N(0, σ2 PI) over the GP prior parameters φ. For regression, we may set p(y|x, θ) = N(y|hθ(x), σ2)... For classification, we choose p(y|x, θ) = Categorical(softmax(hθ(x))). Our loss function is the negative log-likelihood... we employ diagonal Gaussian priors, that is, Pφl = N(µPk, diag(σ2 Pk)) with φ := (µPk, ln σPk)... Moreover, we use a zero-centered, spherical Gaussian hyper-prior P := N(0, σ2 PI) over the prior parameters φ. Input: SVGD kernel function k( , ), step size η.