Sub-sampling for Efficient Non-Parametric Bandit Exploration
Authors: Dorian Baudry, Emilie Kaufmann, Odalric-Ambrym Maillard
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper we propose the first multi-armed bandit algorithm based on resampling that achieves asymptotically optimal regret simultaneously for different families of arms (namely Bernoulli, Gaussian and Poisson distributions). Unlike Thompson Sampling which requires to specify a different prior to be optimal in each case, our proposal RB-SDA does not need any distribution-dependent tuning. RB-SDA belongs to the family of Sub-sampling Duelling Algorithms (SDA) which combines the sub-sampling idea first used by the BESA [1] and SSMC [2] algorithms with different sub-sampling schemes. In particular, RB-SDA uses Random Block sampling. We perform an experimental study assessing the flexibility and robustness of this promising novel approach for exploration in bandit models. |
| Researcher Affiliation | Academia | Dorian Baudry Emilie Kaufmann Odalric-Ambrym Maillard dorian.baudry@inria.fr emilie.kaufmann@univ-lille.fr odalric.maillard@inria.fr Univ. Lille, CNRS, Inria, Centrale Lille, UMR 9198-CRISt AL, F-59000 Lille, France |
| Pseudocode | Yes | The pseudo-code of SP-SDA is given in Algorithm 1. |
| Open Source Code | Yes | The Python code used to perform these experiments is available on Github. |
| Open Datasets | No | The paper states 'we perform experiments on simulated data' and describes the parameters for generating Bernoulli and Gaussian bandit models (e.g., '4 different Bernoulli bandit models: 1) K = 2, µ = [0.8, 0.9], 2) K = 2, µ = [0.5, 0.6]', etc.). It does not use or provide access information for a named publicly available dataset. |
| Dataset Splits | No | The paper describes experiments on simulated bandit models and reports results 'at time T = 20000 based on 5000 independent runs'. In the context of bandit problems with simulated data and online learning, traditional train/validation/test dataset splits are not typically used or stated. No specific split percentages or sample counts for validation are mentioned. |
| Hardware Specification | No | The paper states 'Experiments presented in this paper were carried out using the Grid 5000 testbed'. While this identifies the computing infrastructure, it does not provide specific hardware details such as GPU models, CPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions 'The Python code used to perform these experiments is available on Github.' However, it does not specify any particular Python version or versions of libraries/dependencies used (e.g., PyTorch, NumPy, SciPy) to ensure reproducibility. |
| Experiment Setup | Yes | We ran experiments on 4 different Bernoulli bandit models: 1) K = 2, µ = [0.8, 0.9], 2) K = 2, µ = [0.5, 0.6], 3) K = 10, µ1 = 0.1, µ2,3,4 = 0.01, µ5,6,7 = 0.03, µ8,9,10 = 0.05, 4) K = 8 µ = [0.9, 0.85, . . . , 0.85] and 3 different bandits models with N(µk, 1) arms with means: 1) K = 2 µ = [0.5, 0], 2) K = 4, µ = [0, 0, 0], 3) K = 4, µ = [1.5, 1, 0.5, 0]. For each experiment, Table 1 and Table 2 report an estimate of the regret at time T = 20000 based on 5000 independent runs. |