reproducibilityindex.ai

On the Optimality of Perturbations in Stochastic and Adversarial Multi-armed Bandit Problems

Authors: Baekjin Kim, Ambuj Tewari

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present some experimental results with perturbation based algorithms (Alg.1-(ii),(iii)) and compare them to the UCB1 algorithm in the simulated stochastic K-armed bandit. In all experiments, the number of arms (K) is 10, the number of different episodes is 1000, and true mean rewards (µi) are generated from U[0, 1] [19]. We consider the following four examples of 1-sub-Gaussian reward distributions that will be shifted by true mean µi; (a) Uniform, U[ 1, 1], (b) Rademacher, R{ 1, 1}, (c) Gaussian, N(0, 1), and (d) Gaussian mixture, W N( 1, 1) + (1 W) N(1, 1) where W Bernoulli(1/2). Under each reward setting, we run ﬁve different algorithms; UCB1, RCB with Uniform and Rademacher, and FTPL via Gaussian N(0, σ2) and Double-exponential (σ) after we use grid search to tune conﬁdence levels for conﬁdence based algorithms and the parameter σ for FTPL algorithms.
Researcher Affiliation	Academia	Baekjin Kim Department of Statistics University of Michigan Ann Arbor, MI 48109 baekjin@umich.edu Ambuj Tewari Department of Statistics University of Michigan Ann Arbor, MI 48109 tewaria@umich.edu
Pseudocode	Yes	Algorithm 1: Randomized probability matching algorithms via Perturbation
Open Source Code	Yes	1https://github.com/Kimbaekjin/Perturbation-Methods-Stochastic MAB
Open Datasets	No	The paper mentions that true mean rewards are generated from U[0, 1] and uses specific reward distributions (Uniform, Rademacher, Gaussian, Gaussian Mixture), which are simulated/synthetic. It does not provide access information for a publicly available or open dataset.
Dataset Splits	No	The paper states "In all experiments, the number of arms (K) is 10, the number of different episodes is 1000" and discusses reward distributions. However, it does not specify any training/test/validation dataset splits (percentages or sample counts) as it uses simulated data, not a pre-existing dataset with typical splits.
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments. It only mentions running "simulated stochastic K-armed bandit" experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers. It refers to algorithms and distributions but no software dependencies.
Experiment Setup	Yes	In all experiments, the number of arms (K) is 10, the number of different episodes is 1000, and true mean rewards (µi) are generated from U[0, 1] [19]... after we use grid search to tune conﬁdence levels for conﬁdence based algorithms and the parameter σ for FTPL algorithms. All tuned conﬁdence level and parameter are speciﬁed in Figure 1.