Quantile Bandits for Best Arms Identification

Authors: Mengyan Zhang, Cheng Soon Ong

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show illustrative experiments for best arm identification. In this section, we illustrate how the proposed Q-SAR algorithm works on a toy example (Section 5.1) and demonstrate the empirical performance on a vaccine simulation (Section 5.2).
Researcher Affiliation Academia 1The Australian National University 2Data61, CSIRO. Correspondence to: Cheng Soon Ong <chengsoon.ong@anu.edu.au>.
Pseudocode Yes Algorithm 1 Q-SAR
Open Source Code Yes 1https://github.com/Mengyanz/QSAR
Open Datasets No The paper describes generating its own data for experiments (e.g., 'We generate 1000 rewards for each strategy by simulating the epidemic for 180 days using Flu TE 2', 'constructing three arms with absolute Gaussian distribution or exponential distribution') and links to the simulation tool, but does not explicitly state that the generated dataset itself is publicly available or provide a direct link/citation to it.
Dataset Splits No The paper sets a 'fixed budget of N rounds' for the bandit problem, which dictates the number of samples, but it does not describe specific training, validation, or test dataset splits in terms of percentages or counts, or refer to standard predefined splits.
Hardware Specification No The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions 'Flu TE 2' (with a link to its GitHub repository) as a tool used for vaccine simulation, but does not specify a version number for this or any other software dependencies like programming languages or libraries.
Experiment Setup Yes We divide the budget N into K 1 phases. The number of samples drawn for each arm in each phase remains the same as in the Bubeck et al. (2013). Let the active set A1 = {1, ..., K}, the accepted set M1 = , the number of arms left to find l1 = m, log(K) = 1/2 + PK i=2 1/i , n0 = 0, and for p {1, ..., K 1}, np = l 1 log(K) N K K+1 p m . We set up simulated environments by constructing three arms with absolute Gaussian distribution or exponential distribution. We generate 1000 rewards for each strategy by simulating the epidemic for 180 days using Flu TE 2 (with basic reproduction number R0 = 1.3).