PACOH: Bayes-Optimal Meta-Learning with PAC-Guarantees
Authors: Jonas Rothfuss, Vincent Fortuin, Martin Josifoski, Andreas Krause
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we instantiate our framework with Gaussian Processes (GPs) and Bayesian Neural Networks (BNNs) as base learners. Across several regression and classification environments, our proposed approach achieves state-of-the-art predictive accuracy, while also improving the calibration of the uncertainty estimates. |
| Researcher Affiliation | Academia | 1ETH Zurich, Switzerland 2EPFL, Switzerland. |
| Pseudocode | Yes | Algorithm 1 PACOH with SVGD approximation of Q |
| Open Source Code | Yes | The source code for PACOH-GP is available at tinyurl.com/pacoh-gp-code. An implementation of PACOH-NN can be found at tinyurl.com/pacoh-nn-code. |
| Open Datasets | Yes | Swiss Free Electron Laser (Swiss FEL) (Milne et al., 2017; Kirschner et al., 2019b), Physio Net 2012 challenge (Silva et al., 2012), Intel Berkeley Research Lab temperature sensor dataset (Berkeley-Sensor) (Madden, 2004), Omniglot (Lake et al., 2015) |
| Dataset Splits | No | The paper mentions 30 meta-train and 20 meta-test tasks for Omniglot, and refers to 'target training' and 'target testing' in Figure 1. However, it does not provide specific percentages or counts for train/validation/test dataset splits within each task in the main text. |
| Hardware Specification | No | The paper discusses computational complexity and memory usage but does not provide specific hardware details such as GPU/CPU models, memory amounts, or detailed computer specifications used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiments. |
| Experiment Setup | Yes | we use λ = n, β = m, the negative log-likelihood as loss function and a Gaussian hyper-prior P = N(0, σ2 PI) over the GP prior parameters φ. For regression, we may set p(y|x, θ) = N(y|hθ(x), σ2)... For classification, we choose p(y|x, θ) = Categorical(softmax(hθ(x))). Our loss function is the negative log-likelihood... we employ diagonal Gaussian priors, that is, Pφl = N(µPk, diag(σ2 Pk)) with φ := (µPk, ln σPk)... Moreover, we use a zero-centered, spherical Gaussian hyper-prior P := N(0, σ2 PI) over the prior parameters φ. Input: SVGD kernel function k( , ), step size η. |