Dueling Bandits with Weak Regret

Authors: Bangrui Chen, Peter I. Frazier

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental WS is simple to compute, even for problems with many arms, and we demonstrate through numerical experiments on simulated and real data that WS has significantly smaller regret than existing algorithms in both the weakand strong-regret settings.
Researcher Affiliation Academia 1Cornell University, Ithaca, NY. Correspondence to: Peter I. Frazier <pf98@cornell.edu>.
Pseudocode Yes Algorithm 1 WS-W Algorithm 2 WS-S
Open Source Code No The paper does not provide any explicit statement or link indicating that the source code for the methodology described is publicly available.
Open Datasets Yes We now compare WS-W with RUCB and QSA using simulated data and the Yelp academic dataset (Yelp, 2012). We use the sushi and MSLR datasets, which were previously used by Komiyama et al. (2016) and Zoghi et al. (2015) respectively to evaluate dueling bandit algorithms.
Dataset Splits No The paper operates within a dueling bandit/online learning framework and does not describe traditional training/validation/test dataset splits with specific percentages or counts. It evaluates performance over time on simulated and real datasets.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments.
Software Dependencies No The paper mentions 'doc2vec (Rehurek & Sojka, 2010)' but does not specify a version number. No other software components with version numbers are listed.
Experiment Setup Yes WS-S has a user-defined parameter β. In our experiments we set β = 1.1.