Adversarial Dueling Bandits
Authors: Aadirupa Saha, Tomer Koren, Yishay Mansour
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we corroborate the theoretical results with empirical evaluations. In this section we evaluate the empirical evaluation of our proposed algorithm Dueling-EXP3 and compare its performances with the only other existing adversarial dueling bandit algorithm, REX3, although it is known to work only under the restricted class of linear utility based preferences (Gajane et al., 2015; Ailon et al., 2014). |
| Researcher Affiliation | Collaboration | 1Microsoft Research, New York City 2Blavatnik School of Computer Science, Tel Aviv University, and Google Research Tel Aviv. Correspondence to: Aadirupa Saha <aadirupa.saha@microsoft.com>. |
| Pseudocode | Yes | Algorithm 1 Dueling-EXP3 (D-EXP3) Algorithm 2 Dueling-EXP3 (High Probability) Algorithm 3 Borda-Confidence-Bound (BCB) |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described. |
| Open Datasets | No | The paper describes generating synthetic data for experiments (e.g., "Switching Borda or SB(t)", "Random-walk preferences or RW(ν)", "Lower Bound instance or LB(ϵ)") rather than using a publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper describes setting up and running simulations in an online learning (bandit) environment. It does not mention or specify traditional training, validation, or test dataset splits (e.g., percentages, sample counts) as would be common in supervised learning tasks. |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory, cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) that would be needed to replicate the experiments. |
| Experiment Setup | Yes | Algorithm 1 Dueling-EXP3 (D-EXP3) Input: Item set indexed by [K], learning rate η > 0, parameters γ (0, 1) Initialize: Initial probability distribution q1(i) = 1/K, i [K] Let η = ((log K)/(T K))2/3 and γ = ηK. For any T, the expected regret of Algorithm 1 satisfies E[RT ] 6(K log K)1/3T 2/3. The explicit values used for τ, ν, ϵ are specified in the corresponding figures. |