Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Generalizing while preserving monotonicity in comparison-based preference learning models

Authors: Julien Fageot, Peva Blanchard, Gilles Bareilles, Lê-Nguyên Hoang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that this monotonicity is far from being a general guarantee, and that our new class of generalizing models improves accuracy, especially when the dataset is limited. ... Finally, we evaluate the statistical performance of our learning algorithms through numerical experiments. ... Section 5 reports on our experiments. ... Appendix A provides complementary experiments on real-world data.
Researcher Affiliation	Collaboration	Julien Fageot Tournesol Peva Blanchard Kleis Technology Gilles Bareilles CTU in Prague Lê-Nguyên Hoang Calicarpa, Tournesol
Pseudocode	No	The paper describes mathematical frameworks, definitions, and proofs. It does not contain any explicit pseudocode blocks, algorithm figures, or structured, step-by-step procedures formatted as an algorithm.
Open Source Code	Yes	The code is available at https://github.com/pevab/gbtlab2, and will be made publicly after the review process. ... The code to reproduce experiments is available at https://github.com/pevab/gbtlab2.
Open Datasets	Yes	The real-world data contains comparisons between Youtube videos made by various users, from the Tournesol platform [17].
Dataset Splits	Yes	Figure 3 reports the empirical risks of the two models, using a 10-fold cross validation scheme over the dataset D (1000 comparisons).
Hardware Specification	Yes	We run experiments on a personal laptop with 16GB of RAM and a 2.10 GHz processor.
Software Dependencies	No	The paper discusses mathematical models and experimental results but does not specify any particular software libraries, frameworks, or their version numbers that were used for implementation or analysis.
Experiment Setup	Yes	We shall only consider the uniform root law f = 1 21[ 1,1] and set σ = 1. ... Data are generated with (f , x , L , σ ) = 1 21[ 1,1], I x T , 0, 1 , where x is a one-hot encoding matrix (see Section 3.3). ... Both models have the uniform distribution in [ 1, 1] as a root law, f(r) = 1 21[ 1,1](r), and the same Gaussian prior. We do not use any Laplacian regularization.