Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Anytime-valid, Bayes-assisted, Prediction-Powered Inference
Authors: Valentin Kilian, Stefano Cortinovis, Francois Caron
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6 Experiments We compare the PPI and PPI++ Asymp CS procedures introduced in Section 5 with and without Bayes assistance to the Asymp CS relying solely on labelled data (obtained from Theorem 1 and referred to as classical ) on several estimation problems... 6.1 Synthetic data... 6.2 Real data |
| Researcher Affiliation | Academia | Valentin Kilian Department of Statistics, University of Oxford EMAIL; Stefano Cortinovis Department of Statistics, University of Oxford EMAIL; François Caron Department of Statistics, University of Oxford EMAIL |
| Pseudocode | No | The paper describes methods conceptually and mathematically, including propositions and theorems, but does not present a formal pseudocode or algorithm block. |
| Open Source Code | Yes | As stated in the supplementary material, the code used to perform our experiments is made available online under a permissive licence. |
| Open Datasets | Yes | We evaluate our method on several real-world datasets, which are described in Section S6.2. ... Figure 3 compares classical and PPI++ Asymp CS procedures on the FLIGHTS, FOREST, and GALAXIES datasets... Figure S8 reports results for three additional estimation tasks: linear regression (CENSUS), logistic regression (HEALTHCARE), and quantile estimation (GENES)... All datasets have permissive licenses and are properly credited in the supplementary material. |
| Dataset Splits | Yes | we simulate an online setting akin to Section 6.1 by randomly splitting the data into a labelled set of size n1, serving as a labelled data stream, and an unlabelled set of size N. |
| Hardware Specification | Yes | All experiments were run locally on an Apple Silicon M4 Pro CPU with 24GB of memory, and implementation details are provided in the supplementary material. |
| Software Dependencies | No | The main text of the paper does not specify software dependencies with version numbers. It mentions 'implementation details are provided in the supplementary material', but these are not in the main paper. |
| Experiment Setup | Yes | For synthetic data, we set N = unlabelled samples { e Xj}N j=1 iid PX and successively sample n labelled observations (Xi, Yi)n i=1 iid P with the goal of estimating the mean θ = E[Y ]. ... Noisy predictions. This experiment demonstrates that our method can adapt to varying correlation levels between predictions and true labels by using the PPI++ estimator (23). We sample Yi iid N(0, 1), so that θ = E[Y ] = 0. The prediction rule is defined as f(Xi) = Yi + ϵi, where Xi is only used for indexing and ϵi iid N(0, σ2 Y ), with the noise level σY {0.1, 0.8, 3}. ... Biased predictions. This experiment illustrates the potential benefits of incorporating prior information into our method. We sample Xi iid N(0, 1) and Yi = Xi + ϵi, where ϵi iid tdf(0, 1), so that θ = E[Y ] = 0. The prediction rule is defined as f(Xi) = Xi + υ, where υ R controls its bias level. ... We vary υ between 1.2 and 1.2, and df {5, 10, } to study the impact of bias level and noise distribution on the Asymp CS procedures. |