Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Stochastic bandits with arm-dependent delays
Authors: Manegueu Anne Gael, Claire Vernade, Alexandra Carpentier, Michal Valko
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 8. Experiments We now evaluate the empirical performance of Patient Bandits. Throughout the section, we provide experiments where the delays of each arm i follows the Pareto Type I distribution with tail index αi, where the rewards of each arm i follow a Bernoulli distribution with parameter µi. |
| Researcher Affiliation | Collaboration | 1Otto-von-Guericke University of Magdeburg, DE 2Deep Mind, London, UK 3Deep Mind, Paris, FR. |
| Pseudocode | Yes | Algorithm 1 Patient Bandits |
| Open Source Code | No | The paper does not provide a link to open-source code for the described methodology or state that it is publicly available. |
| Open Datasets | No | The paper uses synthetically generated data based on Pareto Type I and Bernoulli distributions, and does not refer to a publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper describes simulation parameters like the horizon and number of runs but does not specify dataset splits (e.g., train/validation/test) in the context of fixed datasets. |
| Hardware Specification | No | The paper does not specify the hardware used to run the experiments, only general experimental settings. |
| Software Dependencies | No | The paper does not specify software dependencies with version numbers used for the experiments. |
| Experiment Setup | Yes | We consider a 2-arm setting with horizon T = 3000, with arm means µ = (0.5, 0.55) and with tail index α1 = 1, α2 = 0.3 respectively. We consider α ∈ [0.02, 0.5] an in Figure 2 show the regret as function of α. ... For D-UCB we consider various threshold parameters m ∈ {10, 50, 100, 200}. ... We compare the algorithm Adapt-Patient Bandits launched with parameters (α = 0.1, c = 0.7, µ = 0.6) with the algorithm Patient Bandits launched with three different parameters parameters α ∈ {0.1, 0.3, 0.6}. The maximal horizon is here T = 10000 and the results are averaged over 100 runs. |