Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Constrained Pareto Set Identification with Bandit Feedback

Authors: Cyrille Kone, Emilie Kaufmann, Laura Richert

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our theoretical results are supported by an extensive empirical evaluation on a series of benchmarks. ... Finally, in Section 5 we validate our algorithms through an empirical evaluation on two real-world datasets from clinical trials. ... We report the distribution of the sample complexity in Figure 4.
Researcher Affiliation	Academia	1Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9198CRISt AL, F-59000 Lille, France 2Univ. Bordeaux, Inserm, Inria, BPH, U1219, Sistm, F-33000 Bordeaux, France. Correspondence to: Cyrille Kone <EMAIL>.
Pseudocode	Yes	Algorithm 1 e-c APE ... Algorithm 2 Game-c PSI
Open Source Code	No	The paper does not provide any specific links to source code repositories, nor does it explicitly state that the code for the described methodology is openly available or included in supplementary materials.
Open Datasets	Yes	Secukinumab trial. We use historical data from the phase 2 trial of Mark C et al. (2013)... Cov Boost 19 trial We simulate a constrained PSI bandit instance using data from Munro et al. (2021)...
Dataset Splits	No	The paper describes experiments on simulated bandit instances derived from real-world trial data. In this bandit setting, algorithms sequentially sample arms. The concept of fixed training/test/validation dataset splits, typical for supervised learning, is not directly applicable or discussed in the context of the experimental setup.
Hardware Specification	Yes	The experiments were run on an ARM64 8GB RAM/8 core/256GB disk storage computer.
Software Dependencies	Yes	All the algorithms were mainly implemented in python 3.10 with some functions in cython for faster execution.
Experiment Setup	Yes	We set the error parameter to δ = 0.1 and we reported a negligible empirical error. ... Following the literature, we use slightly smaller thresholds than those licensed by theory and use the standard βi := p 2σ2 i log(log(t)/δ)/Nt,i. Moreover, for the confidence bounds on M and m, instead of using individual confidence intervals, we use confidence intervals on pairs, as in the experiments of Auer et al. (2016) and Kone et al. (2023), and set βi,j = q β2 i + β2 j .