Optimal Batched Best Arm Identification

Authors: Tianyuan Jin, Yu Yang, Jing Tang, Xiaokui Xiao, Pan Xu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also conduct numerical experiments to compare our proposed algorithms with the optimal sequential algorithm Track-and-Stop [17], and the batched algorithm Top-k δ-Elimination [22] on various problem instances.
Researcher Affiliation Academia Tianyuan Jin1, Yu Yang3, Jing Tang2, Xiaokui Xiao1, Pan Xu3 1National University of Singapore 2The Hong Kong University of Science and Technology (Guangzhou) 3Duke University {tianyuan,xkxiao}@nus.edu.sg, jingtang@ust.hk, {yu.yang,pan.xu}@duke.edu
Pseudocode Yes Algorithm 1: Three-Batch Best Arm Identification (Tri-BBAI)
Open Source Code Yes The implementation of this work can be found at https://github.com/panxulab/Optimal-Batched-Best-Arm-Identification
Open Datasets No For all experiments in this section, we set the number of arms n = 10, where each arm has Bernoulli reward distribution with mean µi for i [10]. More specifically, the mean rewards are generated by the following two cases. Uniform: The best arm has µ1 = 0.5, and the mean rewards of the rest of the arms follow uniform distribution over [0.2, 0.4], i.e., µi is uniformly generated from [0.2, 0.4] for i [n] {1}. Normal: The best arm has µ1 = 0.6, and the mean rewards of the rest of the arms are first generated from normal distribution N(0.2, 0.2) and then projected to the interval [0, 0.4].
Dataset Splits No The paper describes how the reward distributions for the bandit arms are generated for experiments but does not specify any explicit training, validation, or test dataset splits (e.g., 80/10/10 split or specific sample counts).
Hardware Specification Yes We perform all computations in Python on R9 5900HX for all our experiments.
Software Dependencies No The paper mentions performing computations 'in Python' but does not specify the Python version or any other software dependencies with their respective version numbers (e.g., specific libraries or frameworks).
Experiment Setup Yes The hyperparameters of all methods are chosen as follows... For Tri-BBAI and Opt-BBAI, we set α = 1.0017, and ϵ = 0.01. We use the same β(t) function for Chernoff s stopping condition as in Track-and-Stop. Moreover, for the lengths of the batches, we set L1, L2 and L3 to be the value calculated by Theorem 3.1.