Best Arm Identification with Fixed Budget: A Large Deviation Perspective

Authors: Po-An Wang, Ruo-Chun Tzeng, Alexandre Proutiere

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive numerical experiments confirm this observation.We illustrate our results via numerical experiments, and compare CR to other BAI algorithms.5 Numerical Experiments We consider various problem instances to numerically evaluate the performance of CR. In these instances, we vary the number of arms from 5 to 55; we use Bernoulli distributed rewards, and vary the shape of the arm-to-reward mapping. For each instance, we compare CR to SR, SH, and UGap E.
Researcher Affiliation Academia Po-An Wang EECS KTH, Stockholm, Sweden wang9@kth.seRuo-Chun Tzeng EECS KTH, Stockholm, Sweden rctzeng@kth.seAlexandre Proutiere EECS and Digital Futures KTH, Stockholm, Sweden alepro@kth.se
Pseudocode Yes Algorithm 1: SR initialization CK [K], j K; Algorithm 2: CR-C and CR-A Input: θ0 (0, 1 log K ) Q independent of T (can be chosen as small as one wishes) initialization
Open Source Code No The paper does not provide any explicit statement about releasing source code or links to a code repository.
Open Datasets No The paper refers to 'Bernoulli distributed rewards' and 'various problem instances' for numerical experiments but does not specify or provide access information for any publicly available or open datasets. The experiments seem to be simulations based on these described distributions rather than pre-existing datasets.
Dataset Splits No The paper conducts numerical experiments with 'sampling budget T' but does not specify typical training, validation, or test dataset splits or sample counts. The context of 'fixed budget' in bandit problems differs from standard ML dataset splits.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the numerical experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks, or specific solvers).
Experiment Setup Yes The lengths of phases are set as follows. Define log K := 1 2 + PK k=2 1 k. The candidate set is denoted by Cj when it has j > 2 arms. In the corresponding phase, (i) each arm in Cj is sampled until the round t when mink Cj Nk(t) reaches T/(jlog K) (recall that Nk(t) is the number of times arm k has been sampled up to round t); (ii) the empirical worst arm, denoted by ℓj, is then discarded, i.e., Cj 1 = Cj \ {ℓj}. Input: θ0 (0, 1 log K ) Q independent of T (can be chosen as small as one wishes)