Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits

Authors: Tianyuan Jin, Jing Tang, Pan Xu, Keke Huang, Xiaokui Xiao, Quanquan Gu

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we compare our algorithm BABA with KLUCB (Garivier & Capp e, 2011) under two reward distributions, i.e., Gaussian distribution and Bernoulli distribution. For each distribution, we test BABA and KL-UCB with 2 arms and 5 arms respectively. ... All the experiments are averaged over 2000 repetitions.
Researcher Affiliation Academia 1School of Computing, National University of Singapore, Singapore 2Data Science and Analytics Thrust, The Hong Kong University of Science and Technology 3Department of Computer Science, University of California, Los Angeles, CA 90095, USA.
Pseudocode Yes Algorithm 1: Batched Anytime Bandit Alg. (BABA); Algorithm 2: UNIFORMEXPLORATION; Algorithm 3: INITIALEXPLOITATION; Algorithm 4: OPTIMISTICEXPLORATION; Algorithm 5: CONFIDENTEXPLORATION; Algorithm 6: CONFIDENTEXPLOITATION
Open Source Code No The paper does not provide any links to source code or statements about its public availability.
Open Datasets No The paper describes experiments conducted using 'Gaussian distribution' and 'Bernoulli distribution' with specified parameters (µ, σ, p). These are synthetic distributions and not named public datasets with concrete access information (links, DOIs, or citations to data repositories).
Dataset Splits No The paper studies an online learning problem (multi-armed bandits) and describes experiments in terms of 'number of pulls T' and 'regret'. It does not involve a fixed dataset with explicit training, validation, and test splits.
Hardware Specification No The paper does not provide any specific details regarding the hardware used to run the experiments (e.g., GPU models, CPU types, memory).
Software Dependencies No The paper does not specify any software names with version numbers used for its experiments.
Experiment Setup Yes For our BABA algorithm, we set α = 3 and I1 = 2000. All the experiments are averaged over 2000 repetitions. ... For 2-arm setting, we set µ {1, 0} and σ = 1 for Gaussian distribution; we set p {0.5, 0.25} for Bernoulli distribution. For 5-arm setting, we set µ {1, 0.5, 0.5, 0.5, 0.5} and σ = 1 for Gaussian distribution; we set p {0.5, 0.25, 0.25, 0.25, 0.25} for Bernoulli distribution.