Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Personalized Federated Conformal Prediction with Localization

Authors: Yinjie Min, Chuchen Zhang, Liuhua Peng, Changliang Zou

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on synthetic data and five real datasets to demonstrate marginal coverage validity of our proposed PFCP compared to Fed CP [23], Fed CP-QQ [16], CPlab [27] and CPhet [28] and its local coverage improvement relative to GLCP.
Researcher Affiliation	Academia	Yinjie Min School of Statistics and Data Science Nankai University EMAIL Chuchen Zhang School of Statistics and Data Science Nankai University EMAIL Liuhua Peng School of Mathematics & Statistics The University of Melbourne EMAIL Changliang Zou School of Statistics and Data Science, LPMC and KLMDASR and LEBPS Nankai University EMAIL
Pseudocode	Yes	Algorithm 1 Local Training Procedure of Engression Algorithm 2 Federated Training Procedure of Engression Algorithm 3 Federated Density Ratio Estimation Procedure Algorithm 4 Conditional Distribution Estimator Aggregation Procedure Algorithm 5 Personalized Federated Conformal Prediction (PFCP)
Open Source Code	Yes	The codes are available in the repository https://github.com/Oswin Min/PFCP.
Open Datasets	Yes	We evaluate our proposed PFCP method on five public-domain regression datasets also considered by [32, 33, 16]: physicochemical properties of protein tertiary structure (BIO) [30], bike sharing (BIKE) [8], communities and crimes (CRIME) [31], Tennessee s student teacher achievement ratio (STAR) [1], concrete compressive strength (CONCRETE) [43], and a derma image classification dataset (DERMA) [42].
Dataset Splits	Yes	The target agent dataset is divided into four parts: a predictor training set (I), a conditional distribution training set (II), a calibration dataset (III), and a test dataset (IV), where the first three datasets share the same data volume n. To ensure each agent has the same volume of available data, the source agent data set is divided into two parts (I) and (II), each with a data volume 3n/2.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using 'neural networks', 'logistic regression', 'Platt calibration', and 'engressor' but does not specify version numbers for these tools or any other software libraries.
Experiment Setup	Yes	We set n = 100 in this experiment. The pretrained predictors bµ( ) and {bµk( )}K k=1 are obtained using neural networks with same hidden layers 30 30. ... We set n = 50 for the CONCRETE dataset and n = 100 for other four datasets. In each repeated experiment, the number of source agents (excluding the target) varies across benchmark datasets: 20 for BIO, 12 for BIKE, 4 for CRIME, 8 for STAR, 5 for CONCRETE and 4 for DERMA. To ensure fair comparison, all agents use neural networks with identical architectures for point estimation, with hidden layers of size 30 30, and the engressor [35] is trained using the same network structure with hidden layers of 100 100, all trained for the same number of epochs and learning rate.