Stochastic bandits with arm-dependent delays
Authors: Manegueu Anne Gael, Claire Vernade, Alexandra Carpentier, Michal Valko
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 8. Experiments We now evaluate the empirical performance of Patient Bandits. Throughout the section, we provide experiments where the delays of each arm i follows the Pareto Type I distribution with tail index αi, where the rewards of each arm i follow a Bernoulli distribution with parameter µi. |
| Researcher Affiliation | Collaboration | 1Otto-von-Guericke University of Magdeburg, DE 2Deep Mind, London, UK 3Deep Mind, Paris, FR. |
| Pseudocode | Yes | Algorithm 1 Patient Bandits |
| Open Source Code | No | The paper does not provide a link to open-source code for the described methodology or state that it is publicly available. |
| Open Datasets | No | The paper uses synthetically generated data based on Pareto Type I and Bernoulli distributions, and does not refer to a publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper describes simulation parameters like the horizon and number of runs but does not specify dataset splits (e.g., train/validation/test) in the context of fixed datasets. |
| Hardware Specification | No | The paper does not specify the hardware used to run the experiments, only general experimental settings. |
| Software Dependencies | No | The paper does not specify software dependencies with version numbers used for the experiments. |
| Experiment Setup | Yes | We consider a 2-arm setting with horizon T = 3000, with arm means µ = (0.5, 0.55) and with tail index α1 = 1, α2 = 0.3 respectively. We consider α ∈ [0.02, 0.5] an in Figure 2 show the regret as function of α. ... For D-UCB we consider various threshold parameters m ∈ {10, 50, 100, 200}. ... We compare the algorithm Adapt-Patient Bandits launched with parameters (α = 0.1, c = 0.7, µ = 0.6) with the algorithm Patient Bandits launched with three different parameters parameters α ∈ {0.1, 0.3, 0.6}. The maximal horizon is here T = 10000 and the results are averaged over 100 runs. |