Identification of the Generalized Condorcet Winner in Multi-dueling Bandits

Authors: Björn Haddenhorst, Viktor Bengs, Eyke Hüllermeier

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In a numerical study, we show that DKWT empirically outperforms current state-of-the-art algorithms, even in the special case of dueling bandits or under a Plackett-Luce assumption on the feedback mechanism.In the following, we present experimental results on the performance of our GCW identification solution.
Researcher Affiliation Academia Björn Haddenhorst Department of Computer Science Paderborn University Paderborn, Germany bjoernha@mail.upb.de Viktor Bengs Institute of Informatics University of Munich (LMU) Munich, Germany viktor.bengs@lmu.de Eyke Hüllermeier Institute of Informatics University of Munich (LMU) Munich, Germany eyke@ifi.lmu.de
Pseudocode Yes Algorithm 1 DKW mode-identification with abstention Input: γ (0, 1), h (0, 1), access to iid samples Xt Cat(p) 1: T 8 ln(4/γ)/h2 2: Observe samples X1, . . . , XT 3: Calculate ˆp T = (ˆp T 1 , . . . , ˆp T k ) as ˆp T i := 1 T PT t=1 1{Xt=i}, i [k] 4: Choose i mode(ˆp T ) 5: if ˆp T i > maxj =i ˆp T j + h then return i 6: else return UNSURE and Algorithm 2 DKW mode-identification Solution to Pk,γ k ( 0) and Algorithm 3 DVORETZKY KIEFER WOLFOWITZ TOURNAMENT Solution to Pm,γ k ( GCW 0)
Open Source Code Yes Our implementation is provided at https://github.com/bjoernhad/GCWidentification.
Open Datasets No The experiments are conducted on probability models generated with specific parameters (e.g., 'P PM 5 k(PL) with underlying PL-parameter θ') or sampled uniformly at random, rather than using a distinct public dataset with explicit access information.
Dataset Splits No The paper's empirical evaluation focuses on the sample complexity and accuracy of bandit algorithms based on observations drawn from a probabilistic model, rather than using a fixed dataset with explicit training, validation, or test splits. The term 'samples' refers to the observations generated by the model during the algorithm's execution, not pre-split data portions.
Hardware Specification Yes All experiments were conducted on a machine with an Intel Core i7-4700MQ Processor, executing all experiments (including those in the supplemental material) with only one CPU core in use took less than 72 hours.
Software Dependencies No The paper states that its implementation is provided at a GitHub link but does not explicitly list any software dependencies with specific version numbers within the text.
Experiment Setup Yes We restrict ourselves in the main paper to DKWT, which is our solution of the most general problem Pm,γ k ( h GCW 0). Throughout all experiments, if not specified differently in the pseudocode, every choice of an element within a specific set made by DKWT is performed uniformly at random. and Table 3 shows the results of both algorithms when started on an instance P PM 5 k(PL) with underlying PL-parameter θ = (1, 0.8, 0.6, 0.4, 0.2) and γ = 0.05, for different values of k.