Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Effortless, Simulation-Efficient Bayesian Inference using Tabular Foundation Models
Authors: Julius Vetter, Manuel Gloeckler, Daniel Gedon, Jakob H Macke
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To assess NPE-PFN, we conduct experiments on synthetic SBI benchmark tasks and real data, covering scenarios from low to high-dimensional data and including cases with model misspecification. We evaluate NPE-PFN on various tasks from the SBI benchmark [27], which provides ground truth posterior samples for 10 observations for each task. We measure posterior sample quality using the classifier two-sample test (C2ST, 53). |
| Researcher Affiliation | Academia | 1Machine Learning in Science, University of Tübingen, Tübingen, Germany 2Tübingen AI Center, Tübingen, Germany 3Department Empirical Inference, Max Planck Institute for Intelligent Systems, Tübingen, Germany {firstname.lastname}@uni-tuebingen.de |
| Pseudocode | Yes | pseudocode for NPE-PFN in Appendix Alg. 1. ... pseudocode for TSNPE-PFN in Appendix Alg. 2. |
| Open Source Code | Yes | Code available at https://github.com/mackelab/npe-pfn. ... Code to use NPE-PFN and reproduce the results is available at https://github. com/mackelab/npe-pfn. |
| Open Datasets | Yes | We evaluate NPE-PFN on various tasks from the SBI benchmark [27] ... We infer posteriors for 10 real observations from the Allen cell type database [61] ... we apply Tab PFN on some classical unconditional density estimation benchmark tasks from the UCI repository [56]. |
| Dataset Splits | Yes | Training was stopped early based on the validation loss, as evaluated on a held-out set containing 10% of the available simulations. ... We equally divide the simulation budget into 10 rounds ... we use 103, 104, or (if applicable) 105 samples for training. |
| Hardware Specification | Yes | We use a mix of Nvidia 2080TI, A100, and H100 GPUs to obtain the results related to NPE-PFN. ... SBI baselines were run on 8 CPU cores ... All runtimes for NPE-PFN (Fig. 2b) were obtained using an Nvidia A100 GPU, where possible. For the unfiltered variant of NPE-PFN, an H100 GPU was used for the large context containing 105 simulations. |
| Software Dependencies | No | The paper mentions PyTorch [86], Tab PFN [38] library, SBI library [54], Hydra [87] and Adam optimizer [90], and refers to specific flow types (neural spline flow [42], masked autoregressive flow [41]). However, it does not provide specific version numbers for these software components, only citations to their respective papers. |
| Experiment Setup | Yes | Training was performed using the Adam optimizer [90] with a batch size of 200 and a learning rate of 5 x 10^-4. Training was stopped early based on the validation loss... In all experiments, we use the default version of the Tab PFN classifier or regressor for (TS)NPE-PFN, with no changes to hyperparameters such as the softmax temperature. |