Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Diversity-Preserving $K$--Armed Bandits, Revisited

Authors: Hedi Hadiji, Sébastien Gerchinovitz, Jean-Michel Loubes, Gilles Stoltz

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this final section, we perform some (simple and preliminary) experiments that merely illustrate the dual behavior of the regret: either bounded or growing at a ln T rate. We believe that a more extensive empirical comparison would be interesting but would be out of the scope we targeted for this article. (...) We ran the diversity-preserving UCB algorithm (see Box B, abbreviated Div PUCB below), as well as Algorithm 2 (Constrained-L1-OFUL, abbreviated L1-OFUL below) of Celis et al. (2019). We did so on each of the two problems να, over T = 100,000 time steps, for N = 100 runs. The expected regret suffered by each algorithm is estimated by the empirical averages of pseudo-regrets observed on the N runs: (...) Figures 1 report the estimates RT (να) obtained (solid lines); shaded areas correspond to 2 standard errors of the series RT (να, i) used in the definition of the RT (να) as empirical averages.
Researcher Affiliation	Academia	Hédi Hadiji EMAIL L2S CNRS Centrale Supélec Université Paris-Saclay, Gif-sur-Yvette Sébastien Gerchinovitz EMAIL Institut de recherche technologique Saint Exupéry, Toulouse Institut de mathématiques de Toulouse, Université Paul Sabatier, Toulouse Jean-Michel Loubes EMAIL Institut de mathématiques de Toulouse, Université Paul Sabatier, Toulouse Gilles Stoltz EMAIL Université Paris-Saclay, CNRS, Laboratoire de mathématiques d Orsay, Orsay, France HEC Paris, Jouy-en-Josas, France
Pseudocode	Yes	Box A: Protocol of diversity-preserving stochastic bandits (Celis et al., 2019) Box B: Diversity-preserving UCB for polytopes
Open Source Code	No	The paper does not provide any explicit statement about releasing code, nor does it include links to a code repository.
Open Datasets	No	We consider K = 3 arms and the model D of Bernoulli distributions. (...) In this final section, we perform some (simple and preliminary) experiments that merely illustrate the dual behavior of the regret: either bounded or growing at a ln T rate. We believe that a more extensive empirical comparison would be interesting but would be out of the scope we targeted for this article. The paper uses synthetically generated data and does not provide access information for any public datasets.
Dataset Splits	No	The paper uses synthetic data for its experiments and describes the parameters of the problem and simulation (e.g., 'over T = 100,000 time steps, for N = 100 runs'). However, it does not specify explicit training, testing, or validation splits, which is common for simulations rather than evaluations on pre-existing datasets.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments, such as GPU or CPU models, or memory specifications.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers used for implementing the algorithms or running experiments.
Experiment Setup	Yes	Experimental setting. We consider K = 3 arms and the model D of Bernoulli distributions. The diversity-preserving set P is the triangle generated by p(1) = (0, 0.2, 0.8) , p(2) = (0.6, 0.2, 0.2) , and p(3) = (0, 0.8, 0.2) . We consider the bandit problems να with expectations µα = (1/2 α, 1/3, 1/2 + α) , where α { 0.1, 0.1} . (...) Numerical experiments. We ran the diversity-preserving UCB algorithm (...) over T = 100,000 time steps, for N = 100 runs. The expected regret suffered by each algorithm is estimated by the empirical averages of pseudo-regrets observed on the N runs.