reproducibilityindex.ai

Transformers Can Do Bayesian Inference

Authors: Samuel Müller, Noah Hollmann, Sebastian Pineda Arango, Josif Grabocka, Frank Hutter

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efﬁcient Bayesian inference for intractable problems, with over 200-fold speedups in multiple setups compared to current methods. We obtain strong results in very diverse areas such as Gaussian process regression, Bayesian neural networks, classiﬁcation for small tabular data sets, and few-shot image classiﬁcation, demonstrating the generality of PFNs. In our ﬁrst set of experiments, we study the capability of PFNs to perform Bayesian inference for the tractable case of Gaussian Processes (GPs) with ﬁxed hyperparameters (where we can compare to ground truth data; Section 5.1) and the intractable cases of GPs with unknown hyperparameters (Section 5.2) and Bayesian Neural Networks (BNNs; Section 5.3).
Researcher Affiliation	Collaboration	Samuel M uller1, Noah Hollmann2, Sebastian Pineda1, Josif Grabocka1, Frank Hutter1,3 1University of Freiburg, 2Charit e Berlin, 3Bosch Center for Artiﬁcial Intelligence
Pseudocode	Yes	Algorithm 1: Training a PFN model by Fitting Prior-Data
Open Source Code	Yes	Code and trained PFNs are released at https://github. com/automl/Transformers Can Do Bayesian Inference.
Open Datasets	Yes	We used a large collection of tabular datasets from the open-source Open ML Auto ML Benchmark (Gijsbers et al., 2019); we ﬁrst removed datasets with more than one hundred features or missing values, ending up with 20 datasets that represent a diverse set of classiﬁcation problems with numerical and categorical features.
Dataset Splits	Yes	We also deﬁne a set of six unrelated validation datasets used for optimizing the prior distribution over architectures of the PFNs. This is similar to setting the range of hyperparameters in a cross-validation grid search and can be reused for all similar problems. See Appendix G for more details. We used grid search with 5-fold cross-validation to optimize our baselines' hyperparameters for each dataset, using the hyperparameter spaces described in Table 6 in the appendix. for each dataset sampled 20 subsets, each including 100 samples. Within each subset we provide labels for the ﬁrst 30 samples and evaluate on the remaining samples.
Hardware Specification	Yes	when run on a GPU (Nvidia Tesla V100), it requires as little as 13 seconds for all 20 datasets combined.
Software Dependencies	No	The paper mentions “Py Torch (Paszke et al., 2019)” and “Pyro (Bingham et al., 2018)” but does not specify their version numbers for reproducibility.
Experiment Setup	Yes	For all experiments we used a embedding size of 512, only for few-shot classiﬁcation we used 1024. The only hyper-parameters that we did ﬁne-tune for the Transformer training were the batch size and learning rate. We used a learning rate of 1e-5 to yield the performance shown in the plot after sampling 500,000 samples from the training tasks. Table 5: Hyperparameters considered during grid search tuning of the PFN-BNN on validation datasets.