Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Probabilistic Conformal Prediction with Approximate Conditional Validity

Authors: Vincent Plassier, Alexander Fishkov, Mohsen Guizani, Maxim Panov, Eric Moulines

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using extensive simulations, we show that our method consistently outperforms existing approaches in terms of conditional coverage, leading to more reliable statistical inference in a variety of applications. We demonstrate the effectiveness of the proposed method through a series of experiments on synthetic and real-world datasets. The results indicate that our approach consistently outperforms existing methods in terms of conditional coverage. Specifically, it excels in handling classical regression problems, effectively addressing multimodality, and proves robust in the more challenging setting of multidimensional prediction tasks; see Section 4.
Researcher Affiliation	Academia	1Mohamed bin Zayed University of Artificial Intelligence 2CMAP, École Polytechnique 3Lagrange Mathematics and Computing Research Center 4Skolkovo Institute of Science and Technology EMAIL EMAIL
Pseudocode	Yes	Algorithm 1 CP2-PCP ... Algorithm 2 CP2-HPD
Open Source Code	Yes	1Code of experiments can be found at https://github.com/stat-ml/conditional_cp
Open Datasets	Yes	Datasets. We use publicly available regression datasets, which are also considered in (Romano et al., 2019; Wang et al., 2023). Some of them come from the UCI repository: bike sharing (bike), protein structure (bio), blog feedback (blog), Facebook comments (fb1 and fb2). Other datasets come from US Department of Health surveys (meps19, meps20 and meps21), and from weather forecasts (temp; Cho et al. (2020)).
Dataset Splits	Yes	Specifically, we partition the data into two disjoint subsets: a training set, T = {( Xk, Yk)}m k=1, and a calibration set, C = {(Xk, Yk)}n k=1. ... The number of training and calibration samples is m = 104 and n = 103, respectively. ... Our experimental setup largely follows the approach outlined in (Wang et al., 2023). Specifically, we split each dataset into training, calibration, and testing sets. ... This process is repeated across 50 different random splits of each dataset.
Hardware Specification	No	The paper mentions that neural networks are trained and models are run but does not provide any specific hardware details such as GPU/CPU models, memory, or cloud computing specifications.
Software Dependencies	No	We use the Mixture Density Network (Bishop, 1994) implementation from CDE (Rothfuss et al., 2019) Python package3 as a base model for CP, PCP and CP2. For the CQR (Romano et al., 2019) and CHR (Sesia & Romano, 2021) we use the original authors implementation4. For the CPCG (Gibbs et al., 2023) we also use the original authors implementation5. For LCP (Guan, 2023) we once again used the original author s implementation6. For CDSplit+ we use the implementation from Wang et al. (2023) 7. While various software implementations are mentioned, specific version numbers for these Python packages or libraries are not provided.
Experiment Setup	Yes	A Mixture Density Network (MDN) with 10 components is then trained to approximate the conditional distribution PY \|X. ... The underlying neural network contains two hidden layers of 100 neurons each and was trained for 1000 epochs for each split of the data. Number of components of the Gaussian Mixture was set to 10 for all datasets. ... The underlying neural network that outputs conditional quantiles consists of two hidden layers with 64 neurons each. Training was performed for 200 epochs for batch size 250.