Best Arm Identification with Fixed Budget: A Large Deviation Perspective
Authors: Po-An Wang, Ruo-Chun Tzeng, Alexandre Proutiere
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive numerical experiments confirm this observation.We illustrate our results via numerical experiments, and compare CR to other BAI algorithms.5 Numerical Experiments We consider various problem instances to numerically evaluate the performance of CR. In these instances, we vary the number of arms from 5 to 55; we use Bernoulli distributed rewards, and vary the shape of the arm-to-reward mapping. For each instance, we compare CR to SR, SH, and UGap E. |
| Researcher Affiliation | Academia | Po-An Wang EECS KTH, Stockholm, Sweden wang9@kth.seRuo-Chun Tzeng EECS KTH, Stockholm, Sweden rctzeng@kth.seAlexandre Proutiere EECS and Digital Futures KTH, Stockholm, Sweden alepro@kth.se |
| Pseudocode | Yes | Algorithm 1: SR initialization CK [K], j K; Algorithm 2: CR-C and CR-A Input: θ0 (0, 1 log K ) Q independent of T (can be chosen as small as one wishes) initialization |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code or links to a code repository. |
| Open Datasets | No | The paper refers to 'Bernoulli distributed rewards' and 'various problem instances' for numerical experiments but does not specify or provide access information for any publicly available or open datasets. The experiments seem to be simulations based on these described distributions rather than pre-existing datasets. |
| Dataset Splits | No | The paper conducts numerical experiments with 'sampling budget T' but does not specify typical training, validation, or test dataset splits or sample counts. The context of 'fixed budget' in bandit problems differs from standard ML dataset splits. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the numerical experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks, or specific solvers). |
| Experiment Setup | Yes | The lengths of phases are set as follows. Define log K := 1 2 + PK k=2 1 k. The candidate set is denoted by Cj when it has j > 2 arms. In the corresponding phase, (i) each arm in Cj is sampled until the round t when mink Cj Nk(t) reaches T/(jlog K) (recall that Nk(t) is the number of times arm k has been sampled up to round t); (ii) the empirical worst arm, denoted by ℓj, is then discarded, i.e., Cj 1 = Cj \ {ℓj}. Input: θ0 (0, 1 log K ) Q independent of T (can be chosen as small as one wishes) |