Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Diversity-Preserving $K$--Armed Bandits, Revisited
Authors: Hedi Hadiji, Sébastien Gerchinovitz, Jean-Michel Loubes, Gilles Stoltz
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this final section, we perform some (simple and preliminary) experiments that merely illustrate the dual behavior of the regret: either bounded or growing at a ln T rate. We believe that a more extensive empirical comparison would be interesting but would be out of the scope we targeted for this article. (...) We ran the diversity-preserving UCB algorithm (see Box B, abbreviated Div PUCB below), as well as Algorithm 2 (Constrained-L1-OFUL, abbreviated L1-OFUL below) of Celis et al. (2019). We did so on each of the two problems να, over T = 100,000 time steps, for N = 100 runs. The expected regret suffered by each algorithm is estimated by the empirical averages of pseudo-regrets observed on the N runs: (...) Figures 1 report the estimates RT (να) obtained (solid lines); shaded areas correspond to 2 standard errors of the series RT (να, i) used in the definition of the RT (να) as empirical averages. |
| Researcher Affiliation | Academia | Hédi Hadiji EMAIL L2S CNRS Centrale Supélec Université Paris-Saclay, Gif-sur-Yvette Sébastien Gerchinovitz EMAIL Institut de recherche technologique Saint Exupéry, Toulouse Institut de mathématiques de Toulouse, Université Paul Sabatier, Toulouse Jean-Michel Loubes EMAIL Institut de mathématiques de Toulouse, Université Paul Sabatier, Toulouse Gilles Stoltz EMAIL Université Paris-Saclay, CNRS, Laboratoire de mathématiques d Orsay, Orsay, France HEC Paris, Jouy-en-Josas, France |
| Pseudocode | Yes | Box A: Protocol of diversity-preserving stochastic bandits (Celis et al., 2019) Box B: Diversity-preserving UCB for polytopes |
| Open Source Code | No | The paper does not provide any explicit statement about releasing code, nor does it include links to a code repository. |
| Open Datasets | No | We consider K = 3 arms and the model D of Bernoulli distributions. (...) In this final section, we perform some (simple and preliminary) experiments that merely illustrate the dual behavior of the regret: either bounded or growing at a ln T rate. We believe that a more extensive empirical comparison would be interesting but would be out of the scope we targeted for this article. The paper uses synthetically generated data and does not provide access information for any public datasets. |
| Dataset Splits | No | The paper uses synthetic data for its experiments and describes the parameters of the problem and simulation (e.g., 'over T = 100,000 time steps, for N = 100 runs'). However, it does not specify explicit training, testing, or validation splits, which is common for simulations rather than evaluations on pre-existing datasets. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU or CPU models, or memory specifications. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers used for implementing the algorithms or running experiments. |
| Experiment Setup | Yes | Experimental setting. We consider K = 3 arms and the model D of Bernoulli distributions. The diversity-preserving set P is the triangle generated by p(1) = (0, 0.2, 0.8) , p(2) = (0.6, 0.2, 0.2) , and p(3) = (0, 0.8, 0.2) . We consider the bandit problems να with expectations µα = (1/2 α, 1/3, 1/2 + α) , where α { 0.1, 0.1} . (...) Numerical experiments. We ran the diversity-preserving UCB algorithm (...) over T = 100,000 time steps, for N = 100 runs. The expected regret suffered by each algorithm is estimated by the empirical averages of pseudo-regrets observed on the N runs. |