Identification of the Generalized Condorcet Winner in Multi-dueling Bandits
Authors: Björn Haddenhorst, Viktor Bengs, Eyke Hüllermeier
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In a numerical study, we show that DKWT empirically outperforms current state-of-the-art algorithms, even in the special case of dueling bandits or under a Plackett-Luce assumption on the feedback mechanism.In the following, we present experimental results on the performance of our GCW identification solution. |
| Researcher Affiliation | Academia | Björn Haddenhorst Department of Computer Science Paderborn University Paderborn, Germany bjoernha@mail.upb.de Viktor Bengs Institute of Informatics University of Munich (LMU) Munich, Germany viktor.bengs@lmu.de Eyke Hüllermeier Institute of Informatics University of Munich (LMU) Munich, Germany eyke@ifi.lmu.de |
| Pseudocode | Yes | Algorithm 1 DKW mode-identification with abstention Input: γ (0, 1), h (0, 1), access to iid samples Xt Cat(p) 1: T 8 ln(4/γ)/h2 2: Observe samples X1, . . . , XT 3: Calculate ˆp T = (ˆp T 1 , . . . , ˆp T k ) as ˆp T i := 1 T PT t=1 1{Xt=i}, i [k] 4: Choose i mode(ˆp T ) 5: if ˆp T i > maxj =i ˆp T j + h then return i 6: else return UNSURE and Algorithm 2 DKW mode-identification Solution to Pk,γ k ( 0) and Algorithm 3 DVORETZKY KIEFER WOLFOWITZ TOURNAMENT Solution to Pm,γ k ( GCW 0) |
| Open Source Code | Yes | Our implementation is provided at https://github.com/bjoernhad/GCWidentification. |
| Open Datasets | No | The experiments are conducted on probability models generated with specific parameters (e.g., 'P PM 5 k(PL) with underlying PL-parameter θ') or sampled uniformly at random, rather than using a distinct public dataset with explicit access information. |
| Dataset Splits | No | The paper's empirical evaluation focuses on the sample complexity and accuracy of bandit algorithms based on observations drawn from a probabilistic model, rather than using a fixed dataset with explicit training, validation, or test splits. The term 'samples' refers to the observations generated by the model during the algorithm's execution, not pre-split data portions. |
| Hardware Specification | Yes | All experiments were conducted on a machine with an Intel Core i7-4700MQ Processor, executing all experiments (including those in the supplemental material) with only one CPU core in use took less than 72 hours. |
| Software Dependencies | No | The paper states that its implementation is provided at a GitHub link but does not explicitly list any software dependencies with specific version numbers within the text. |
| Experiment Setup | Yes | We restrict ourselves in the main paper to DKWT, which is our solution of the most general problem Pm,γ k ( h GCW 0). Throughout all experiments, if not specified differently in the pseudocode, every choice of an element within a specific set made by DKWT is performed uniformly at random. and Table 3 shows the results of both algorithms when started on an instance P PM 5 k(PL) with underlying PL-parameter θ = (1, 0.8, 0.6, 0.4, 0.2) and γ = 0.05, for different values of k. |