reproducibilityindex.ai

Stochastic bandits with arm-dependent delays

Authors: Manegueu Anne Gael, Claire Vernade, Alexandra Carpentier, Michal Valko

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	8. Experiments We now evaluate the empirical performance of Patient Bandits. Throughout the section, we provide experiments where the delays of each arm i follows the Pareto Type I distribution with tail index αi, where the rewards of each arm i follow a Bernoulli distribution with parameter µi.
Researcher Affiliation	Collaboration	1Otto-von-Guericke University of Magdeburg, DE 2Deep Mind, London, UK 3Deep Mind, Paris, FR.
Pseudocode	Yes	Algorithm 1 Patient Bandits
Open Source Code	No	The paper does not provide a link to open-source code for the described methodology or state that it is publicly available.
Open Datasets	No	The paper uses synthetically generated data based on Pareto Type I and Bernoulli distributions, and does not refer to a publicly available dataset with concrete access information.
Dataset Splits	No	The paper describes simulation parameters like the horizon and number of runs but does not specify dataset splits (e.g., train/validation/test) in the context of fixed datasets.
Hardware Specification	No	The paper does not specify the hardware used to run the experiments, only general experimental settings.
Software Dependencies	No	The paper does not specify software dependencies with version numbers used for the experiments.
Experiment Setup	Yes	We consider a 2-arm setting with horizon T = 3000, with arm means µ = (0.5, 0.55) and with tail index α1 = 1, α2 = 0.3 respectively. We consider α ∈ [0.02, 0.5] an in Figure 2 show the regret as function of α. ... For D-UCB we consider various threshold parameters m ∈ {10, 50, 100, 200}. ... We compare the algorithm Adapt-Patient Bandits launched with parameters (α = 0.1, c = 0.7, µ = 0.6) with the algorithm Patient Bandits launched with three different parameters parameters α ∈ {0.1, 0.3, 0.6}. The maximal horizon is here T = 10000 and the results are averaged over 100 runs.