Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms
Authors: Mohsen Bayati, Nima Hamidi, Ramesh Johari, Khashayar Khosravi
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | However, numerical investigation reveals interesting behaviors. In Figure 1, we simulate several different algorithms over 400 simulations, for two pairs of T, k in the many-armed regime. Our extensive simulations in Section 6 and in the longer version of paper [7] show that these insights are robust to varying rewards and prior distributions. Indeed, similar results are obtained with Bernoulli rewards and general beta priors. Further, using simulations, we also observe that the same phenomenon arises in the contextual MAB setting, via simulations with synthetic and real-world data. |
| Researcher Affiliation | Collaboration | Mohsen Bayati Stanford University bayati@stanford.edu Nima Hamidi Stanford University hamidi@stanford.edu Ramesh Johari Stanford University rjohari@stanford.edu Khashayar Khosravi Google Research, NYC khosravi@google.com |
| Pseudocode | Yes | Algorithm 1 Asymptotically Optimal UCB... Algorithm 2 Subsampled UCB (SS-UCB)... Algorithm 3 Greedy... Algorithm 4 Subsampled Greedy (SS-Greedy) |
| Open Source Code | Yes | Our code is available at http://github.com/khashayarkhv/many-armed-bandit. |
| Open Datasets | Yes | We use the Letter Recognition Dataset [12] from the UCI repository. |
| Dataset Splits | No | The paper describes the experimental setup in terms of time horizon (T) and number of arms (k) for the multi-armed bandit problem, and the generation of synthetic reward functions and arm parameters. However, it does not provide specific train/validation/test dataset splits (e.g., percentages or sample counts) typically associated with machine learning model training. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, cloud instance types) used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies, libraries, or solvers used in the experiments. |
| Experiment Setup | Yes | In Figure 1, we simulate several different algorithms over 400 simulations, for two pairs of T, k in the many-armed regime... Rewards are generated according to N(µi, 1), with µi U[0, 1]. The list of algorithms included is as follows. (1) UCB: Algorithm 1, (2) SS-UCB: Algorithm 2 with m = T, (3) Greedy: Algorithm 3, (4) SS-Greedy: Algorithm 4 with m = T 2/3 (see Theorem 5), (5) UCB-F: UCB-F algorithm of [27] with the choice of confidence set Et = 2 log(10 log t), (6) TS: Thompson Sampling algorithm [26, 22, 2], and (7) SS-TS: subsampled TS with m = T. We use the Letter Recognition Dataset [12] from the UCI repository... We generate k = 300 arms with parameters θ1, θ2, . . . , θk Rd (d will be specified shortly) and generate reward of arm i via Yit = X t θi +εit. We consider two experiments with d = 2 and d = 6... We generate 50 different instances, where we pick T = 8000 samples at random (from the original 20000 samples) and generate the arm parameters according to the uniform distribution on the ℓ2-ball in Rd, i.e., θi Ud = {u Rd : u 2 1}. |