Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
One Practical Algorithm for Both Stochastic and Adversarial Bandits
Authors: Yevgeny Seldin, Aleksandrs Slivkins
ICML 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results for the stochastic regime are supported by experimental validation. |
| Researcher Affiliation | Collaboration | Yevgeny Seldin EMAIL Queensland University of Technology, Brisbane, Australia Aleksandrs Slivkins EMAIL Microsoft Research, New York NY, USA |
| Pseudocode | Yes | Algorithm 1 Algorithm EXP3++. |
| Open Source Code | No | The paper describes the algorithm but does not provide a link or explicit statement about the availability of its source code. |
| Open Datasets | No | The paper describes a synthetic data generation process ('stochastic multiarmed bandit problem with Bernoulli rewards... rewards are Bernoulli with bias 0.5 and for the single best arm the reward is Bernoulli with bias 0.5 +') rather than using a pre-existing publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper describes parameters for its simulation-based experiments (K values, number of rounds, repetitions) but does not mention explicit train/validation/test dataset splits as typically found in machine learning experiments with static datasets. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU/CPU models, memory, or specific computing environments) used for running the experiments. |
| Software Dependencies | No | The paper mentions comparing with other algorithms (EXP3, UCB1, Thompson's sampling) but does not list any specific software dependencies with version numbers. |
| Experiment Setup | Yes | We run the experiments with K = 2, K = 10, and K = 100, and = 0.1 and = 0.01 (in total, six combinations of K and ). We run each game for 10^7 rounds and make ten repetitions of each experiment. In the experiments EXP3++ is parametrized by ξt(a) = ln(t ˆ t(a)2) / 32t ˆ t(a)2, where ˆ t(a) is the empirical estimate of (a) defined in (2). In order to demonstrate that in the stochastic regime the exploration parameters are in full control of the performance we run the EXP3++ algorithm with two different learning rates. EXP3++EMP corresponds to ηt = βt and EXP3++ACC corresponds to ηt = 1. |