Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits
Authors: Tianyuan Jin, Jing Tang, Pan Xu, Keke Huang, Xiaokui Xiao, Quanquan Gu
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we compare our algorithm BABA with KLUCB (Garivier & Capp e, 2011) under two reward distributions, i.e., Gaussian distribution and Bernoulli distribution. For each distribution, we test BABA and KL-UCB with 2 arms and 5 arms respectively. ... All the experiments are averaged over 2000 repetitions. |
| Researcher Affiliation | Academia | 1School of Computing, National University of Singapore, Singapore 2Data Science and Analytics Thrust, The Hong Kong University of Science and Technology 3Department of Computer Science, University of California, Los Angeles, CA 90095, USA. |
| Pseudocode | Yes | Algorithm 1: Batched Anytime Bandit Alg. (BABA); Algorithm 2: UNIFORMEXPLORATION; Algorithm 3: INITIALEXPLOITATION; Algorithm 4: OPTIMISTICEXPLORATION; Algorithm 5: CONFIDENTEXPLORATION; Algorithm 6: CONFIDENTEXPLOITATION |
| Open Source Code | No | The paper does not provide any links to source code or statements about its public availability. |
| Open Datasets | No | The paper describes experiments conducted using 'Gaussian distribution' and 'Bernoulli distribution' with specified parameters (µ, σ, p). These are synthetic distributions and not named public datasets with concrete access information (links, DOIs, or citations to data repositories). |
| Dataset Splits | No | The paper studies an online learning problem (multi-armed bandits) and describes experiments in terms of 'number of pulls T' and 'regret'. It does not involve a fixed dataset with explicit training, validation, and test splits. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware used to run the experiments (e.g., GPU models, CPU types, memory). |
| Software Dependencies | No | The paper does not specify any software names with version numbers used for its experiments. |
| Experiment Setup | Yes | For our BABA algorithm, we set α = 3 and I1 = 2000. All the experiments are averaged over 2000 repetitions. ... For 2-arm setting, we set µ {1, 0} and σ = 1 for Gaussian distribution; we set p {0.5, 0.25} for Bernoulli distribution. For 5-arm setting, we set µ {1, 0.5, 0.5, 0.5, 0.5} and σ = 1 for Gaussian distribution; we set p {0.5, 0.25, 0.25, 0.25, 0.25} for Bernoulli distribution. |