Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Transductive Conformal Inference for Full Ranking

Authors: Jean-Baptiste Fermanian, Pierre Humbert, Gilles Blanchard

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we empirically show on both synthetic and real data the efficiency of our CP method for state-of-the-art ranking algorithms such as Rank Net or Lambda Mart.
Researcher Affiliation	Academia	Jean-Baptiste Fermanian IMAG, IROKO, Univ. Montpellier, Inria, CNRS, Montpellier, France EMAIL Pierre Humbert LPSM, Sorbonne Université, Paris, France EMAIL Gilles Blanchard Université Paris Saclay, Institut Mathématique d Orsay, Orsay, France EMAIL
Pseudocode	Yes	Algorithm 1 Simulation of Rc+t srt Algorithm 2 Quantile envelope procedure Algorithm 3 Control of the FCP
Open Source Code	Yes	The code of our method is available at https://github.com/pierreHmbt/transductive-conformal-inference-for-ranking.
Open Datasets	Yes	Dataset: We evaluate our approach on the Yummly Food-10k data set which consists in 12624 images of dishes embedded in R101. These embeddings have been constructed to reflect similarities in taste among the dishes (see Wilber et al., 20151 for a complete description). 1Companion website: http://vision.cornell.edu/se3/projects/concept-embeddings Dataset and ranking task: This dataset is composed of features of 16681 movies, characteristics of 15163 users and 106 ratings associated with a tuple (user, movie) with values ranging from 0 to 10. Given the characteristics of a new user, the objective is to produce an ordered list of all the movies ranked by the user s level of interest. We aim to quantify the uncertainty of a smaller model relative to the performance of a larger one, which serves as a reference and defines the ground-truth full ordering of the items, i.e., Rc+t i . More details are provided in Appendix E.2. 3https://www.kaggle.com/datasets/ransakaravihara/anime-recommendation-ltr-dataset
Dataset Splits	Yes	We divide the data into a training, calibration and test sets of respective size ntr = 2624, n = 2000 and m = 8000.
Hardware Specification	No	Our experiments do not require a lot of computational resources and run on a standard machine.
Software Dependencies	No	Note that in all the synthetic experiments, we use Rank Net with a Re LU Neural Network (NN) of 5 hidden layers of size 10. This NN is trained using Pytorch [Paszke, 2019]. For all the experiments on this data set, we use Rank Net with a Re LU NN of 5 hidden layers of size 10. This NN is trained using Py Torch [Paszke, 2019].
Experiment Setup	Yes	Parameters: The parameters α, 1 β and δ, equal to respectively, the probability of miscoverage, the probability to control the FCP at level α, and the probability of the quantile envelope, are set to, respectively, 0.1, 0.75 and 0.02, in all our experiments. Note that in all the synthetic experiments, we use Rank Net with a Re LU Neural Network (NN) of 5 hidden layers of size 10. For all the experiments on this data set, we use Rank Net with a Re LU NN of 5 hidden layers of size 10. The reference model has 400 trees and 20 leaves, the smaller ones have trees = {50, 100, 200, 300} and leaves = {5, 10, 15, 20}.