Stochastic bandits with arm-dependent delays

Authors: Manegueu Anne Gael, Claire Vernade, Alexandra Carpentier, Michal Valko

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 8. Experiments We now evaluate the empirical performance of Patient Bandits. Throughout the section, we provide experiments where the delays of each arm i follows the Pareto Type I distribution with tail index αi, where the rewards of each arm i follow a Bernoulli distribution with parameter µi.
Researcher Affiliation Collaboration 1Otto-von-Guericke University of Magdeburg, DE 2Deep Mind, London, UK 3Deep Mind, Paris, FR.
Pseudocode Yes Algorithm 1 Patient Bandits
Open Source Code No The paper does not provide a link to open-source code for the described methodology or state that it is publicly available.
Open Datasets No The paper uses synthetically generated data based on Pareto Type I and Bernoulli distributions, and does not refer to a publicly available dataset with concrete access information.
Dataset Splits No The paper describes simulation parameters like the horizon and number of runs but does not specify dataset splits (e.g., train/validation/test) in the context of fixed datasets.
Hardware Specification No The paper does not specify the hardware used to run the experiments, only general experimental settings.
Software Dependencies No The paper does not specify software dependencies with version numbers used for the experiments.
Experiment Setup Yes We consider a 2-arm setting with horizon T = 3000, with arm means µ = (0.5, 0.55) and with tail index α1 = 1, α2 = 0.3 respectively. We consider α ∈ [0.02, 0.5] an in Figure 2 show the regret as function of α. ... For D-UCB we consider various threshold parameters m ∈ {10, 50, 100, 200}. ... We compare the algorithm Adapt-Patient Bandits launched with parameters (α = 0.1, c = 0.7, µ = 0.6) with the algorithm Patient Bandits launched with three different parameters parameters α ∈ {0.1, 0.3, 0.6}. The maximal horizon is here T = 10000 and the results are averaged over 100 runs.