reproducibilityindex.ai

One Practical Algorithm for Both Stochastic and Adversarial Bandits

Authors: Yevgeny Seldin, Aleksandrs Slivkins

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results for the stochastic regime are supported by experimental validation.
Researcher Affiliation	Collaboration	Yevgeny Seldin YEVGENY.SELDIN@GMAIL.COM Queensland University of Technology, Brisbane, Australia Aleksandrs Slivkins SLIVKINS@MICROSOFT.COM Microsoft Research, New York NY, USA
Pseudocode	Yes	Algorithm 1 Algorithm EXP3++.
Open Source Code	No	The paper describes the algorithm but does not provide a link or explicit statement about the availability of its source code.
Open Datasets	No	The paper describes a synthetic data generation process ('stochastic multiarmed bandit problem with Bernoulli rewards... rewards are Bernoulli with bias 0.5 and for the single best arm the reward is Bernoulli with bias 0.5 +') rather than using a pre-existing publicly available dataset with concrete access information.
Dataset Splits	No	The paper describes parameters for its simulation-based experiments (K values, number of rounds, repetitions) but does not mention explicit train/validation/test dataset splits as typically found in machine learning experiments with static datasets.
Hardware Specification	No	The paper does not specify any hardware details (e.g., GPU/CPU models, memory, or specific computing environments) used for running the experiments.
Software Dependencies	No	The paper mentions comparing with other algorithms (EXP3, UCB1, Thompson's sampling) but does not list any specific software dependencies with version numbers.
Experiment Setup	Yes	We run the experiments with K = 2, K = 10, and K = 100, and = 0.1 and = 0.01 (in total, six combinations of K and ). We run each game for 10^7 rounds and make ten repetitions of each experiment. In the experiments EXP3++ is parametrized by ξt(a) = ln(t ˆ t(a)2) / 32t ˆ t(a)2, where ˆ t(a) is the empirical estimate of (a) deﬁned in (2). In order to demonstrate that in the stochastic regime the exploration parameters are in full control of the performance we run the EXP3++ algorithm with two different learning rates. EXP3++EMP corresponds to ηt = βt and EXP3++ACC corresponds to ηt = 1.