reproducibilityindex.ai

Sub-sampling for Efficient Non-Parametric Bandit Exploration

Authors: Dorian Baudry, Emilie Kaufmann, Odalric-Ambrym Maillard

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper we propose the ﬁrst multi-armed bandit algorithm based on resampling that achieves asymptotically optimal regret simultaneously for different families of arms (namely Bernoulli, Gaussian and Poisson distributions). Unlike Thompson Sampling which requires to specify a different prior to be optimal in each case, our proposal RB-SDA does not need any distribution-dependent tuning. RB-SDA belongs to the family of Sub-sampling Duelling Algorithms (SDA) which combines the sub-sampling idea ﬁrst used by the BESA [1] and SSMC [2] algorithms with different sub-sampling schemes. In particular, RB-SDA uses Random Block sampling. We perform an experimental study assessing the ﬂexibility and robustness of this promising novel approach for exploration in bandit models.
Researcher Affiliation	Academia	Dorian Baudry Emilie Kaufmann Odalric-Ambrym Maillard dorian.baudry@inria.fr emilie.kaufmann@univ-lille.fr odalric.maillard@inria.fr Univ. Lille, CNRS, Inria, Centrale Lille, UMR 9198-CRISt AL, F-59000 Lille, France
Pseudocode	Yes	The pseudo-code of SP-SDA is given in Algorithm 1.
Open Source Code	Yes	The Python code used to perform these experiments is available on Github.
Open Datasets	No	The paper states 'we perform experiments on simulated data' and describes the parameters for generating Bernoulli and Gaussian bandit models (e.g., '4 different Bernoulli bandit models: 1) K = 2, µ = [0.8, 0.9], 2) K = 2, µ = [0.5, 0.6]', etc.). It does not use or provide access information for a named publicly available dataset.
Dataset Splits	No	The paper describes experiments on simulated bandit models and reports results 'at time T = 20000 based on 5000 independent runs'. In the context of bandit problems with simulated data and online learning, traditional train/validation/test dataset splits are not typically used or stated. No specific split percentages or sample counts for validation are mentioned.
Hardware Specification	No	The paper states 'Experiments presented in this paper were carried out using the Grid 5000 testbed'. While this identifies the computing infrastructure, it does not provide specific hardware details such as GPU models, CPU models, or memory specifications.
Software Dependencies	No	The paper mentions 'The Python code used to perform these experiments is available on Github.' However, it does not specify any particular Python version or versions of libraries/dependencies used (e.g., PyTorch, NumPy, SciPy) to ensure reproducibility.
Experiment Setup	Yes	We ran experiments on 4 different Bernoulli bandit models: 1) K = 2, µ = [0.8, 0.9], 2) K = 2, µ = [0.5, 0.6], 3) K = 10, µ1 = 0.1, µ2,3,4 = 0.01, µ5,6,7 = 0.03, µ8,9,10 = 0.05, 4) K = 8 µ = [0.9, 0.85, . . . , 0.85] and 3 different bandits models with N(µk, 1) arms with means: 1) K = 2 µ = [0.5, 0], 2) K = 4, µ = [0, 0, 0], 3) K = 4, µ = [1.5, 1, 0.5, 0]. For each experiment, Table 1 and Table 2 report an estimate of the regret at time T = 20000 based on 5000 independent runs.