Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Transductive Conformal Inference for Full Ranking

Authors: Jean-Baptiste Fermanian, Pierre Humbert, Gilles Blanchard

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we empirically show on both synthetic and real data the efficiency of our CP method for state-of-the-art ranking algorithms such as Rank Net or Lambda Mart.
Researcher Affiliation Academia Jean-Baptiste Fermanian IMAG, IROKO, Univ. Montpellier, Inria, CNRS, Montpellier, France EMAIL Pierre Humbert LPSM, Sorbonne Université, Paris, France EMAIL Gilles Blanchard Université Paris Saclay, Institut Mathématique d Orsay, Orsay, France EMAIL
Pseudocode Yes Algorithm 1 Simulation of Rc+t srt Algorithm 2 Quantile envelope procedure Algorithm 3 Control of the FCP
Open Source Code Yes The code of our method is available at https://github.com/pierreHmbt/transductive-conformal-inference-for-ranking.
Open Datasets Yes Dataset: We evaluate our approach on the Yummly Food-10k data set which consists in 12624 images of dishes embedded in R101. These embeddings have been constructed to reflect similarities in taste among the dishes (see Wilber et al., 20151 for a complete description). 1Companion website: http://vision.cornell.edu/se3/projects/concept-embeddings Dataset and ranking task: This dataset is composed of features of 16681 movies, characteristics of 15163 users and 106 ratings associated with a tuple (user, movie) with values ranging from 0 to 10. Given the characteristics of a new user, the objective is to produce an ordered list of all the movies ranked by the user s level of interest. We aim to quantify the uncertainty of a smaller model relative to the performance of a larger one, which serves as a reference and defines the ground-truth full ordering of the items, i.e., Rc+t i . More details are provided in Appendix E.2. 3https://www.kaggle.com/datasets/ransakaravihara/anime-recommendation-ltr-dataset
Dataset Splits Yes We divide the data into a training, calibration and test sets of respective size ntr = 2624, n = 2000 and m = 8000.
Hardware Specification No Our experiments do not require a lot of computational resources and run on a standard machine.
Software Dependencies No Note that in all the synthetic experiments, we use Rank Net with a Re LU Neural Network (NN) of 5 hidden layers of size 10. This NN is trained using Pytorch [Paszke, 2019]. For all the experiments on this data set, we use Rank Net with a Re LU NN of 5 hidden layers of size 10. This NN is trained using Py Torch [Paszke, 2019].
Experiment Setup Yes Parameters: The parameters α, 1 β and δ, equal to respectively, the probability of miscoverage, the probability to control the FCP at level α, and the probability of the quantile envelope, are set to, respectively, 0.1, 0.75 and 0.02, in all our experiments. Note that in all the synthetic experiments, we use Rank Net with a Re LU Neural Network (NN) of 5 hidden layers of size 10. For all the experiments on this data set, we use Rank Net with a Re LU NN of 5 hidden layers of size 10. The reference model has 400 trees and 20 leaves, the smaller ones have trees = {50, 100, 200, 300} and leaves = {5, 10, 15, 20}.