reproducibilityindex.ai

Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms

Authors: Mohsen Bayati, Nima Hamidi, Ramesh Johari, Khashayar Khosravi

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	However, numerical investigation reveals interesting behaviors. In Figure 1, we simulate several different algorithms over 400 simulations, for two pairs of T, k in the many-armed regime. Our extensive simulations in Section 6 and in the longer version of paper [7] show that these insights are robust to varying rewards and prior distributions. Indeed, similar results are obtained with Bernoulli rewards and general beta priors. Further, using simulations, we also observe that the same phenomenon arises in the contextual MAB setting, via simulations with synthetic and real-world data.
Researcher Affiliation	Collaboration	Mohsen Bayati Stanford University bayati@stanford.edu Nima Hamidi Stanford University hamidi@stanford.edu Ramesh Johari Stanford University rjohari@stanford.edu Khashayar Khosravi Google Research, NYC khosravi@google.com
Pseudocode	Yes	Algorithm 1 Asymptotically Optimal UCB... Algorithm 2 Subsampled UCB (SS-UCB)... Algorithm 3 Greedy... Algorithm 4 Subsampled Greedy (SS-Greedy)
Open Source Code	Yes	Our code is available at http://github.com/khashayarkhv/many-armed-bandit.
Open Datasets	Yes	We use the Letter Recognition Dataset [12] from the UCI repository.
Dataset Splits	No	The paper describes the experimental setup in terms of time horizon (T) and number of arms (k) for the multi-armed bandit problem, and the generation of synthetic reward functions and arm parameters. However, it does not provide specific train/validation/test dataset splits (e.g., percentages or sample counts) typically associated with machine learning model training.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, cloud instance types) used to run the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies, libraries, or solvers used in the experiments.
Experiment Setup	Yes	In Figure 1, we simulate several different algorithms over 400 simulations, for two pairs of T, k in the many-armed regime... Rewards are generated according to N(µi, 1), with µi U[0, 1]. The list of algorithms included is as follows. (1) UCB: Algorithm 1, (2) SS-UCB: Algorithm 2 with m = T, (3) Greedy: Algorithm 3, (4) SS-Greedy: Algorithm 4 with m = T 2/3 (see Theorem 5), (5) UCB-F: UCB-F algorithm of [27] with the choice of conﬁdence set Et = 2 log(10 log t), (6) TS: Thompson Sampling algorithm [26, 22, 2], and (7) SS-TS: subsampled TS with m = T. We use the Letter Recognition Dataset [12] from the UCI repository... We generate k = 300 arms with parameters θ1, θ2, . . . , θk Rd (d will be speciﬁed shortly) and generate reward of arm i via Yit = X t θi +εit. We consider two experiments with d = 2 and d = 6... We generate 50 different instances, where we pick T = 8000 samples at random (from the original 20000 samples) and generate the arm parameters according to the uniform distribution on the ℓ2-ball in Rd, i.e., θi Ud = {u Rd : u 2 1}.