One Practical Algorithm for Both Stochastic and Adversarial Bandits
Authors: Yevgeny Seldin, Aleksandrs Slivkins
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results for the stochastic regime are supported by experimental validation. |
| Researcher Affiliation | Collaboration | Yevgeny Seldin YEVGENY.SELDIN@GMAIL.COM Queensland University of Technology, Brisbane, Australia Aleksandrs Slivkins SLIVKINS@MICROSOFT.COM Microsoft Research, New York NY, USA |
| Pseudocode | Yes | Algorithm 1 Algorithm EXP3++. |
| Open Source Code | No | The paper describes the algorithm but does not provide a link or explicit statement about the availability of its source code. |
| Open Datasets | No | The paper describes a synthetic data generation process ('stochastic multiarmed bandit problem with Bernoulli rewards... rewards are Bernoulli with bias 0.5 and for the single best arm the reward is Bernoulli with bias 0.5 +') rather than using a pre-existing publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper describes parameters for its simulation-based experiments (K values, number of rounds, repetitions) but does not mention explicit train/validation/test dataset splits as typically found in machine learning experiments with static datasets. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU/CPU models, memory, or specific computing environments) used for running the experiments. |
| Software Dependencies | No | The paper mentions comparing with other algorithms (EXP3, UCB1, Thompson's sampling) but does not list any specific software dependencies with version numbers. |
| Experiment Setup | Yes | We run the experiments with K = 2, K = 10, and K = 100, and = 0.1 and = 0.01 (in total, six combinations of K and ). We run each game for 10^7 rounds and make ten repetitions of each experiment. In the experiments EXP3++ is parametrized by ξt(a) = ln(t ˆ t(a)2) / 32t ˆ t(a)2, where ˆ t(a) is the empirical estimate of (a) defined in (2). In order to demonstrate that in the stochastic regime the exploration parameters are in full control of the performance we run the EXP3++ algorithm with two different learning rates. EXP3++EMP corresponds to ηt = βt and EXP3++ACC corresponds to ηt = 1. |