Ensemble sampling for linear bandits: small ensembles suffice
Authors: David Janz, Alexander Litvak, Csaba Szepesvari
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We provide the first useful and rigorous analysis of ensemble sampling for the stochastic linear bandit setting. In particular, we show that, under standard assumptions, for a d-dimensional stochastic linear bandit with an interaction horizon T, ensemble sampling with an ensemble of size of order d log T incurs regret at most of the order (d log T)5/2 T. Ours is the first result in any structured setting not to require the size of the ensemble to scale linearly with T which defeats the purpose of ensemble sampling while obtaining near T order regret. Our result is also the first to allow for infinite action sets. |
| Researcher Affiliation | Academia | David Janz University of Oxford david.janz@stats.ox.ac.uk Alexander E. Litvak University of Alberta alitvak@ualberta.ca Csaba Szepesvári University of Alberta szepesva@ualberta.ca |
| Pseudocode | Yes | Algorithm 1 Linear ensemble sampling |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code release. |
| Open Datasets | No | The paper describes a theoretical problem setting (stochastic linear bandit setting) and does not use or provide access to an empirical dataset for training or evaluation. |
| Dataset Splits | No | The paper is theoretical and does not include empirical experiments with dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not describe any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical, defining algorithmic parameters (e.g., regularization parameter λ, ensemble size m, perturbation scales rt) as part of its theoretical framework, rather than describing an empirical experimental setup with hyperparameters or training configurations. |