Qualitative Multi-Armed Bandits: A Quantile-Based Approach

Authors: Balazs Szorenyi, Robert Busa-Fekete, Paul Weng, Eyke Hüllermeier

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental These properties are also illustrated by means of first experimental studies. In the first experiment, we compare the performance of the QPAC algorithm with a standard PAC bandit algorithm on bandit problems for which the (ϵ, τ)-optimal arm coin-cides with the ϵ-best arm in terms of means, thereby assuring that both learners are seeking the same arm. As a baseline, we run the SUCCESSIVEELIMINATION learner (SE) by Even-Dar et al. (2002), which is an (ϵ, δ)-PAC learner (i.e., it returns an ϵ-optimal arm with probability at least 1 δ). To guarantee a fair comparison, the confidence interval defined in (2) is used in our implementation of SE, which differs only in constants from the one of Even-Dar et al. (2002). We tested the algorithm on the Needle in the Haystack (NHS) problem, which consists of arms obeying Bernoulli distribution with parameter p except for one of them, the target, which has a slightly higher parameter p + p .
Researcher Affiliation Academia Bal azs Sz or enyi1,5,6 SZORENYI@INF.U-SZEGED.HU R obert Busa-Fekete2 BUSAROBI@UPB.DE Paul Weng3,4 PAWENG@CMU.EDU Eyke H ullermeier2 EYKE@UPB.DE 1INRIA Lille Nord Europe, Seque L project, 40 avenue Halley, 59650 Villeneuve d Ascq, France 2Department of Computer Science, University of Paderborn, Warburger Str. 100, 33098 Paderborn, Germany 3SYSU-CMU Joint Institute of Engineering, 132 East Waihuan Road, Guangzhou, 510006, China 4SYSU-CMU Shunde International Joint Research Institute, 9 Eastern Nanguo Road, Shunde, 528300, China 5MTA-SZTE Research Group on Artificial Intelligence, Tisza Lajos krt. 103., H-6720 Szeged, Hungary 6Department of Electrical Engineering, The Technion Israel Institute of Technology, Haifa, Israel 32000
Pseudocode Yes Algorithm 1 QPAC(δ, ϵ, τ) and Algorithm 2 QUCB(τ) are explicitly present in the paper.
Open Source Code No The paper does not contain any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No The paper describes generating synthetic data for experiments (“We generated random bandit instances”, “Multinomial arm distributions as described in the previous section, with parameters drawn uniformly at random”) and does not mention using or providing access to any publicly available or open datasets with specific links or citations.
Dataset Splits No The paper focuses on online bandit learning, where data is generated through interaction. It does not describe traditional train/validation/test dataset splits with percentages, sample counts, or citations to predefined splits.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU models, or memory amounts used for running the experiments.
Software Dependencies No The paper does not provide specific version numbers for any software components, libraries, or solvers used in the experiments.
Experiment Setup Yes The confidence parameter δ was set to 0.05 for each run; accordingly, the average accuracy was significantly above 1 δ = 0.95 in each case. The approximation error ϵ was set to 0.01, 0.03 and 0.05. We set x1 = 1, x2 = 2 and x3 = 3 in the case of UCB. The parameter δ was set to 0.1. For the quantiles, we used τ {0.5, 0.9}. The NHS problem with parameters p = 0.45, p = 0.1, τ = 0.5 and p = 0.85, p = 0.1, τ = 0.9.