Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

False Coverage Proportion Control for Conformal Prediction

Authors: Alexandre Blain, Bertrand Thirion, Pierre Neuvial

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experimental validation on Open ML datasets, we demonstrate that our proposed methods effectively control the FCP and produce sharp prediction intervals. We use 17 Open ML (Vanschoren et al., 2014) datasets from (Grinsztajn et al., 2022). Each dataset is randomly split (nsplit = 30 times) into a train, calibration and test set.
Researcher Affiliation	Academia	1INRIA 2Université Paris-Saclay 3Institut de Mathématiques de Toulouse, Université de Toulouse; CNRS; UPS, F-31062 Toulouse Cedex 9, France 4CEA. Correspondence to: Alexandre Blain <EMAIL>.
Pseudocode	Yes	Algorithm 1 Sampling order statistics of conformal pvalues using Proposition 1. Algorithm 2 Computing the Empirical JER. Algorithm 3 Performing calibration on conformal pvalues.
Open Source Code	Yes	An implementation of Co JER is available at https://github.com/sanssouci-org/Co JER-paper, together with the code to reproduce the numerical results of this paper.
Open Datasets	Yes	We use 17 Open ML (Vanschoren et al., 2014) datasets from (Grinsztajn et al., 2022).
Dataset Splits	No	Each dataset is randomly split (nsplit = 30 times) into a train, calibration and test set. The paper specifies the creation of train, calibration, and test sets and the number of splits (30 times) but does not provide explicit proportions (e.g., 70/15/15) for these splits.
Hardware Specification	Yes	All experiments were performed using 40 CPUs, Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
Software Dependencies	No	The paper mentions models like Random Forest, Multi-Layer Perceptron, Support Vector Regression, K-Nearest Neighbors, and Lasso, but does not provide specific version numbers for any software libraries or frameworks used (e.g., Python, scikit-learn, PyTorch versions).
Experiment Setup	Yes	We use α = 0.1 for all methods. For FCP controlling methods, we set δ = 0.1 and use SCP with the largest level α such that FCPα ,δ α.