Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Identification of the Generalized Condorcet Winner in Multi-dueling Bandits
Authors: Björn Haddenhorst, Viktor Bengs, Eyke Hüllermeier
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In a numerical study, we show that DKWT empirically outperforms current state-of-the-art algorithms, even in the special case of dueling bandits or under a Plackett-Luce assumption on the feedback mechanism.In the following, we present experimental results on the performance of our GCW identification solution. |
| Researcher Affiliation | Academia | Björn Haddenhorst Department of Computer Science Paderborn University Paderborn, Germany EMAIL Viktor Bengs Institute of Informatics University of Munich (LMU) Munich, Germany EMAIL Eyke Hüllermeier Institute of Informatics University of Munich (LMU) Munich, Germany EMAIL |
| Pseudocode | Yes | Algorithm 1 DKW mode-identification with abstention Input: γ (0, 1), h (0, 1), access to iid samples Xt Cat(p) 1: T 8 ln(4/γ)/h2 2: Observe samples X1, . . . , XT 3: Calculate ˆp T = (ˆp T 1 , . . . , ˆp T k ) as ˆp T i := 1 T PT t=1 1{Xt=i}, i [k] 4: Choose i mode(ˆp T ) 5: if ˆp T i > maxj =i ˆp T j + h then return i 6: else return UNSURE and Algorithm 2 DKW mode-identification Solution to Pk,γ k ( 0) and Algorithm 3 DVORETZKY KIEFER WOLFOWITZ TOURNAMENT Solution to Pm,γ k ( GCW 0) |
| Open Source Code | Yes | Our implementation is provided at https://github.com/bjoernhad/GCWidentification. |
| Open Datasets | No | The experiments are conducted on probability models generated with specific parameters (e.g., 'P PM 5 k(PL) with underlying PL-parameter θ') or sampled uniformly at random, rather than using a distinct public dataset with explicit access information. |
| Dataset Splits | No | The paper's empirical evaluation focuses on the sample complexity and accuracy of bandit algorithms based on observations drawn from a probabilistic model, rather than using a fixed dataset with explicit training, validation, or test splits. The term 'samples' refers to the observations generated by the model during the algorithm's execution, not pre-split data portions. |
| Hardware Specification | Yes | All experiments were conducted on a machine with an Intel Core i7-4700MQ Processor, executing all experiments (including those in the supplemental material) with only one CPU core in use took less than 72 hours. |
| Software Dependencies | No | The paper states that its implementation is provided at a GitHub link but does not explicitly list any software dependencies with specific version numbers within the text. |
| Experiment Setup | Yes | We restrict ourselves in the main paper to DKWT, which is our solution of the most general problem Pm,γ k ( h GCW 0). Throughout all experiments, if not specified differently in the pseudocode, every choice of an element within a specific set made by DKWT is performed uniformly at random. and Table 3 shows the results of both algorithms when started on an instance P PM 5 k(PL) with underlying PL-parameter θ = (1, 0.8, 0.6, 0.4, 0.2) and γ = 0.05, for different values of k. |