Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Transductive Conformal Inference for Full Ranking
Authors: Jean-Baptiste Fermanian, Pierre Humbert, Gilles Blanchard
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we empirically show on both synthetic and real data the efficiency of our CP method for state-of-the-art ranking algorithms such as Rank Net or Lambda Mart. |
| Researcher Affiliation | Academia | Jean-Baptiste Fermanian IMAG, IROKO, Univ. Montpellier, Inria, CNRS, Montpellier, France EMAIL Pierre Humbert LPSM, Sorbonne Université, Paris, France EMAIL Gilles Blanchard Université Paris Saclay, Institut Mathématique d Orsay, Orsay, France EMAIL |
| Pseudocode | Yes | Algorithm 1 Simulation of Rc+t srt Algorithm 2 Quantile envelope procedure Algorithm 3 Control of the FCP |
| Open Source Code | Yes | The code of our method is available at https://github.com/pierreHmbt/transductive-conformal-inference-for-ranking. |
| Open Datasets | Yes | Dataset: We evaluate our approach on the Yummly Food-10k data set which consists in 12624 images of dishes embedded in R101. These embeddings have been constructed to reflect similarities in taste among the dishes (see Wilber et al., 20151 for a complete description). 1Companion website: http://vision.cornell.edu/se3/projects/concept-embeddings Dataset and ranking task: This dataset is composed of features of 16681 movies, characteristics of 15163 users and 106 ratings associated with a tuple (user, movie) with values ranging from 0 to 10. Given the characteristics of a new user, the objective is to produce an ordered list of all the movies ranked by the user s level of interest. We aim to quantify the uncertainty of a smaller model relative to the performance of a larger one, which serves as a reference and defines the ground-truth full ordering of the items, i.e., Rc+t i . More details are provided in Appendix E.2. 3https://www.kaggle.com/datasets/ransakaravihara/anime-recommendation-ltr-dataset |
| Dataset Splits | Yes | We divide the data into a training, calibration and test sets of respective size ntr = 2624, n = 2000 and m = 8000. |
| Hardware Specification | No | Our experiments do not require a lot of computational resources and run on a standard machine. |
| Software Dependencies | No | Note that in all the synthetic experiments, we use Rank Net with a Re LU Neural Network (NN) of 5 hidden layers of size 10. This NN is trained using Pytorch [Paszke, 2019]. For all the experiments on this data set, we use Rank Net with a Re LU NN of 5 hidden layers of size 10. This NN is trained using Py Torch [Paszke, 2019]. |
| Experiment Setup | Yes | Parameters: The parameters α, 1 β and δ, equal to respectively, the probability of miscoverage, the probability to control the FCP at level α, and the probability of the quantile envelope, are set to, respectively, 0.1, 0.75 and 0.02, in all our experiments. Note that in all the synthetic experiments, we use Rank Net with a Re LU Neural Network (NN) of 5 hidden layers of size 10. For all the experiments on this data set, we use Rank Net with a Re LU NN of 5 hidden layers of size 10. The reference model has 400 trees and 20 leaves, the smaller ones have trees = {50, 100, 200, 300} and leaves = {5, 10, 15, 20}. |