Dueling Bandits with Weak Regret
Authors: Bangrui Chen, Peter I. Frazier
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | WS is simple to compute, even for problems with many arms, and we demonstrate through numerical experiments on simulated and real data that WS has significantly smaller regret than existing algorithms in both the weakand strong-regret settings. |
| Researcher Affiliation | Academia | 1Cornell University, Ithaca, NY. Correspondence to: Peter I. Frazier <pf98@cornell.edu>. |
| Pseudocode | Yes | Algorithm 1 WS-W Algorithm 2 WS-S |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating that the source code for the methodology described is publicly available. |
| Open Datasets | Yes | We now compare WS-W with RUCB and QSA using simulated data and the Yelp academic dataset (Yelp, 2012). We use the sushi and MSLR datasets, which were previously used by Komiyama et al. (2016) and Zoghi et al. (2015) respectively to evaluate dueling bandit algorithms. |
| Dataset Splits | No | The paper operates within a dueling bandit/online learning framework and does not describe traditional training/validation/test dataset splits with specific percentages or counts. It evaluates performance over time on simulated and real datasets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'doc2vec (Rehurek & Sojka, 2010)' but does not specify a version number. No other software components with version numbers are listed. |
| Experiment Setup | Yes | WS-S has a user-defined parameter β. In our experiments we set β = 1.1. |