reproducibilityindex.ai

Qualitative Multi-Armed Bandits: A Quantile-Based Approach

Authors: Balazs Szorenyi, Robert Busa-Fekete, Paul Weng, Eyke Hüllermeier

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	These properties are also illustrated by means of ﬁrst experimental studies. In the ﬁrst experiment, we compare the performance of the QPAC algorithm with a standard PAC bandit algorithm on bandit problems for which the (ϵ, τ)-optimal arm coin-cides with the ϵ-best arm in terms of means, thereby assuring that both learners are seeking the same arm. As a baseline, we run the SUCCESSIVEELIMINATION learner (SE) by Even-Dar et al. (2002), which is an (ϵ, δ)-PAC learner (i.e., it returns an ϵ-optimal arm with probability at least 1 δ). To guarantee a fair comparison, the conﬁdence interval deﬁned in (2) is used in our implementation of SE, which differs only in constants from the one of Even-Dar et al. (2002). We tested the algorithm on the Needle in the Haystack (NHS) problem, which consists of arms obeying Bernoulli distribution with parameter p except for one of them, the target, which has a slightly higher parameter p + p .
Researcher Affiliation	Academia	Bal azs Sz or enyi1,5,6 SZORENYI@INF.U-SZEGED.HU R obert Busa-Fekete2 BUSAROBI@UPB.DE Paul Weng3,4 PAWENG@CMU.EDU Eyke H ullermeier2 EYKE@UPB.DE 1INRIA Lille Nord Europe, Seque L project, 40 avenue Halley, 59650 Villeneuve d Ascq, France 2Department of Computer Science, University of Paderborn, Warburger Str. 100, 33098 Paderborn, Germany 3SYSU-CMU Joint Institute of Engineering, 132 East Waihuan Road, Guangzhou, 510006, China 4SYSU-CMU Shunde International Joint Research Institute, 9 Eastern Nanguo Road, Shunde, 528300, China 5MTA-SZTE Research Group on Artiﬁcial Intelligence, Tisza Lajos krt. 103., H-6720 Szeged, Hungary 6Department of Electrical Engineering, The Technion Israel Institute of Technology, Haifa, Israel 32000
Pseudocode	Yes	Algorithm 1 QPAC(δ, ϵ, τ) and Algorithm 2 QUCB(τ) are explicitly present in the paper.
Open Source Code	No	The paper does not contain any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	No	The paper describes generating synthetic data for experiments (“We generated random bandit instances”, “Multinomial arm distributions as described in the previous section, with parameters drawn uniformly at random”) and does not mention using or providing access to any publicly available or open datasets with specific links or citations.
Dataset Splits	No	The paper focuses on online bandit learning, where data is generated through interaction. It does not describe traditional train/validation/test dataset splits with percentages, sample counts, or citations to predefined splits.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU models, or memory amounts used for running the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software components, libraries, or solvers used in the experiments.
Experiment Setup	Yes	The conﬁdence parameter δ was set to 0.05 for each run; accordingly, the average accuracy was signiﬁcantly above 1 δ = 0.95 in each case. The approximation error ϵ was set to 0.01, 0.03 and 0.05. We set x1 = 1, x2 = 2 and x3 = 3 in the case of UCB. The parameter δ was set to 0.1. For the quantiles, we used τ {0.5, 0.9}. The NHS problem with parameters p = 0.45, p = 0.1, τ = 0.5 and p = 0.85, p = 0.1, τ = 0.9.