On Weak Regret Analysis for Dueling Bandits
Authors: El Mehdi Saad, Alexandra Carpentier, Tomáš Kocák, Nicolas Verzelen
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we perform a numerical evaluation of WR-EXP3-IX and WR-TINF algorithms in three different scenarios that favor different algorithms according to the prior theoretical results. As a benchmark for our experiments, we utilize the state-of-the-art algorithm for weak regret, WS-W [4]. Additionally, we include one of the best-performing algorithms for strong regret, Versatile-DB [14], to demonstrate that optimizing for strong regret does not necessarily translate into optimal weak regret performance. For each of the experiments, we plot the mean regret over 20 iterations together with 0.2 and 0.8 quantiles. All the experiments in this section use theoretical values of parameters for the algorithms. |
| Researcher Affiliation | Academia | El Mehdi Saad KAUST mehdi.saad@kaust.edu.sa Alexandra Carpentier Institut für Mathematik Universität Potsdam carpentier@uni-potsdam.de Tomáš Kocák Institut für Mathematik Universität Potsdam kocak@uni-potsdam.de Nicolas Verzelen INRAE, MISTEA, Univ. Montpellier nicolas.verzelen@inrae.fr |
| Pseudocode | Yes | Algorithm 1 WR-TINF |
| Open Source Code | Yes | Section 6 provides all the details needed to reproduce the simulations presented in our paper. The code is provided as well. |
| Open Datasets | No | In our experiments, we used data generated synthetically. The description of the distributions of the duels considered is provided in Section 6. The paper does not provide concrete access information for a publicly available or open dataset, as it uses synthetically generated data. |
| Dataset Splits | No | The paper mentions 'mean regret over 20 iterations together with 0.2 and 0.8 quantiles' in the experiments section but does not specify training, validation, or test splits. The problem is a sequential game, not a typical supervised learning task with train/val/test splits. |
| Hardware Specification | No | The runtime of each algorithm and iteration is in terms of minutes on a personal computer. This statement is too vague and does not provide specific hardware details (e.g., CPU/GPU models, memory, or processor types). |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers. |
| Experiment Setup | No | All the experiments in this section use theoretical values of parameters for the algorithms. The paper does not provide specific hyperparameter values, training configurations, or system-level settings in the main text that would allow for concrete replication of the experiment setup. |