Max-Min Grouped Bandits

Authors: Zhenlin Wang, Jonathan Scarlett8603-8611

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Sec. 7, we present numerical experiments investigating the relative performance of the algorithms considered.
Researcher Affiliation Academia Zhenlin Wang,1 Jonathan Scarlett1,2 1School of Computing, National University of Singapore 2Department of Mathematics & Institute of Data Science, National University of Singapore
Pseudocode Yes Algorithm 1: Successive Elimination algorithm
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets No In each experiment, we generate 10 instances, each containing 100 arms and 10 possibly overlapping groups. The arm rewards are Bernoulli distributed, and the instances are generated to ensure a pre-specified minimum gap value ( ) as per (5), and we consider P t0, 1.0.2, 0.4u. [...] We allocate each arm into each group independently with probability 1{10, so that each group contains 10 arms on average. For any subsequently non-allocated arms, we assign it to a uniformly random group. In addition, for any empty group, we assign 10 uniformly random arms into it. We arbitrary choose the first group to be the optimal one, and set its worst arm j0 to have a mean reward of 0.5 (here j0 is chosen to ensure that G is unique). We further choose the second group to be a suboptimal group whose worst arm j1 meets the gap value exactly, i.e. µj0 µj1 . Then, we generate other groups worst arms to be uniform in r0, µj1s. Finally, we choose the means for the remaining arms to be uniform in rµj G, 1s, where j G is the relevant worst arm in the relevant group G.
Dataset Splits No The paper describes generating instances for experiments but does not specify traditional training, validation, or test dataset splits as the methodology is for a multi-armed bandit problem, which involves interactive data generation rather than static splits.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or cloud instance types) used for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes In each experiment, we generate 10 instances, each containing 100 arms and 10 possibly overlapping groups. The arm rewards are Bernoulli distributed, and the instances are generated to ensure a pre-specified minimum gap value ( ) as per (5), and we consider P t0, 1.0.2, 0.4u. [...] Confidence bounds. Both Successive Elimination and STABLEOPT rely on the confidence bounds (8) (9). These bounds are primarily adopted for theoretical convenience, so in our experiments we adopt the more practical choice of ˆµj,Tjptq c ? Tjptq with c 1. Different choices of c are explored in Appendix E. Stopping conditions. Successive Elimination is defined with a stopping condition, but STABLEOPT is not. A natural stopping condition for STABLEOPT is to stop when highest max-min LCB value exceeds all other groups max-min UCB values. However, this often requires an unreasonably large number of pulls, due to the existence of UCB values that are only slightly too low for the algorithm to pull based on. We therefore relax this rule be only requiring it to hold to within a specified tolerance, denoted by η. We set η 0.01 by default, and explore other values in Appendix E.