Combinatorial Pure Exploration for Dueling Bandit
Authors: Wei Chen, Yihan Du, Longbo Huang, Haoyu Zhao
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we study combinatorial pure exploration for dueling bandits (CPE-DB): we have multiple candidates for multiple positions as modeled by a bipartite graph, and in each round we sample a duel of two candidates on one position and observe who wins in the duel, with the goal of finding the best candidate-position matching with high probability after multiple rounds of samples. [...] For Borda winner, we establish a reduction of the problem to the original CPE-MAB setting and design PAC and exact algorithms that achieve both the sample complexity similar to that in the CPE-MAB setting (which is nearly optimal for a subclass of problems) and polynomial running time per round. For Condorcet winner, we first design a fully polynomial time approximation scheme (FPTAS) for the offline problem of finding the Condorcet winner with known winning probabilities, and then use the FPTAS as an oracle to design a novel pure exploration algorithm CAR-Cond with sample complexity analysis. |
| Researcher Affiliation | Collaboration | 1Microsoft Research, Beijing, China 2IIIS, Tsinghua University, Beijing, China. |
| Pseudocode | Yes | Algorithm 1 CLUCB-Borda-PAC (Page 4), Algorithm 2 CAR-Cond (Page 6), Algorithm 3 CAR-Parallel (Page 7), Algorithm 4 CAR-Verify (Page 8). |
| Open Source Code | No | The paper does not provide any statement or link indicating that source code for the methodology is openly available. |
| Open Datasets | No | The paper is theoretical and does not use or refer to any specific publicly available datasets for training or evaluation. The concept of 'sampling duels' is part of their theoretical model, not an external dataset. |
| Dataset Splits | No | The paper is theoretical and does not describe empirical experiments with training, validation, or test dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe empirical experiments, therefore no hardware specifications are provided. |
| Software Dependencies | No | The paper is theoretical and does not mention specific software names with version numbers used for implementation or experimentation. |
| Experiment Setup | No | The paper is theoretical and does not describe empirical experiments, therefore no experimental setup details like hyperparameters or training configurations are provided. |