Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

FAB-PPI: Frequentist, Assisted by Bayes, Prediction-Powered Inference

Authors: Stefano Cortinovis, Francois Caron

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the benefits of FAB-PPI in real and synthetic examples. [...] We compare FAB-PPI and power-tuned FAB-PPI (FABPPI++) to classical inference, PPI and power-tuned PPI (PPI++) on both synthetic and real estimation problems. For FAB-PPI, we use (HS) and (N) to indicate the use of the horseshoe and Gaussian priors defined in Section 4.3. [...] For all experiments, we set α = 0.1 and report the average mean squared error (MSE), interval volume, and coverage over 1000 repetitions.
Researcher Affiliation Academia 1Department of Statistics, University of Oxford. Correspondence to: Stefano Cortinovis <EMAIL>.
Pseudocode Yes Algorithm 1 summarises the steps of the FAB-PPI approach in a general convex estimation problem. [...] Algorithm 2 summarises the FAB-PPI approach under the squared loss...
Open Source Code Yes Code for reproducing the experiments is available at https: //github.com/stefanocortinovis/fab-ppi.
Open Datasets Yes We consider several estimation experiments using the datasets presented in Angelopoulos et al. (2023a) and briefly described in Section S5.1. [...] All of the datasets were downloaded from the examples provided as part of the ppi-py package (Angelopoulos et al., 2023b).
Dataset Splits Yes We sample two datasets, n labelled observations {(Xi, Yi)}n i=1 iid from P and N unlabelled observations { e Xi}N i=1 iid from PX. [...] For this experiment, we assume that N is infinite, set n = 200, and vary γ between 1.5 and 1.5. [...] For this experiment, we set N = 106 and vary n from 100 to 1000. [...] Each dataset comes with covariate/label/prediction triples {Xi, Yi, f(Xi)}N i=1, which we randomly split into two subsets with n labelled and N n unlabelled observations, for varying values of n.
Hardware Specification Yes All of the experiments presented here were run locally on an Intel Core i7-11850H CPU.
Software Dependencies No Code implementing the FAB-PPI method is written in Python and made available at https://github.com/ stefanocortinovis/fab-ppi. Comparisons with standard PPI are performed using the ppi-py package (Angelopoulos et al., 2023b).
Experiment Setup Yes For all experiments, we set α = 0.1 and report the average mean squared error (MSE), interval volume, and coverage over 1000 repetitions. [...] We sample two datasets, n labelled observations {(Xi, Yi)}n i=1 iid from P and N unlabelled observations { e Xi}N i=1 iid from PX. [...] In this experiment, we assume that N is infinite, set n = 200, and vary γ between 1.5 and 1.5. [...] For this experiment, we set N = 106 and vary n from 100 to 1000.