Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Synthetic-powered predictive inference

Authors: Meshi Bashari, Roy Maor Lotan, Yonghoon Lee, Edgar Dobriban, Yaniv Romano

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on image classification augmenting data with synthetic diffusion-model generated images and on tabular regression demonstrate notable improvements in predictive efficiency in data-scarce settings.
Researcher Affiliation	Academia	1Department of Electrical and Computer Engineering, Technion Israel Institute of Technology 2Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, USA 3Department of Computer Science, Technion Israel Institute of Technology
Pseudocode	Yes	Algorithm 1 Synthetic-powered predictive inference (SPI) ... Algorithm 2 SPI with data-dependent k-nearest subset selection ... Algorithm 3 Cramer-von Mises two-sample test statistic ... Algorithm 4 β-selection
Open Source Code	Yes	Software for reproducing the experiments is available at https://github.com/Meshiba/spi.
Open Datasets	Yes	We demonstrate the practicality of our method on multi-class classification and regression tasks. For image classification on Image Net, we explore two practical strategies for constructing synthetic data... For the regression task, suppose we have a pre-trained quantile regression model that estimates the γ-th quantile of the distribution Y \| X, denoted as ˆq(X; γ). The conformalized quantile regression (CQR) score is then defined as s(X, Y ) = max{ˆq(X; α/2) Y, Y ˆq(X; 1 α/2)}. Applying conformal prediction with this score, the prediction set takes the form b C(Xn+1) = h ˆq(Xn+1; α/2) ˆQ1 α, ˆq(Xn+1; 1 α/2) + ˆQ1 α i .
Dataset Splits	Yes	For the marginal coverage experiments, we randomly select m = 15 Image Net images from the real data, chosen from among 30 classes, to construct the real calibration set. The test set consists of 15,000 real images, and the synthetic calibration set includes N = 1,000 generated images, sampled uniformly across all classes. For the label-conditional experiments, we randomly select m = 15 real images for each of the k = 30 classes to form the real calibration set (resulting in mk = 450 real data points), 500 real images per class to form the test set, and n = 1,000 generated images per class to form the synthetic calibration set (resulting in N = nk = 30,000 synthetic data points).
Hardware Specification	Yes	The experiments were conducted on a system running Ubuntu 20.04.6 LTS, with 192 CPU cores of Intel(R) Xeon(R) Gold CPUs at 2.40 GHz, 1 TB of RAM, and 16 NVIDIA A40 GPUs.
Software Dependencies	Yes	The software environment used Python 3.11.5, Py Torch 2.6, and CUDA 12.2.
Experiment Setup	Yes	For the marginal coverage experiments, we randomly select m = 15 Image Net images from the real data, chosen from among 30 classes, to construct the real calibration set. The test set consists of 15,000 real images, and the synthetic calibration set includes N = 1,000 generated images, sampled uniformly across all classes. For the label-conditional experiments, we randomly select m = 15 real images for each of the k = 30 classes to form the real calibration set (resulting in mk = 450 real data points), 500 real images per class to form the test set, and n = 1,000 generated images per class to form the synthetic calibration set (resulting in N = nk = 30,000 synthetic data points).