Adversarial Dueling Bandits

Authors: Aadirupa Saha, Tomer Koren, Yishay Mansour

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we corroborate the theoretical results with empirical evaluations. In this section we evaluate the empirical evaluation of our proposed algorithm Dueling-EXP3 and compare its performances with the only other existing adversarial dueling bandit algorithm, REX3, although it is known to work only under the restricted class of linear utility based preferences (Gajane et al., 2015; Ailon et al., 2014).
Researcher Affiliation Collaboration 1Microsoft Research, New York City 2Blavatnik School of Computer Science, Tel Aviv University, and Google Research Tel Aviv. Correspondence to: Aadirupa Saha <aadirupa.saha@microsoft.com>.
Pseudocode Yes Algorithm 1 Dueling-EXP3 (D-EXP3) Algorithm 2 Dueling-EXP3 (High Probability) Algorithm 3 Borda-Confidence-Bound (BCB)
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described.
Open Datasets No The paper describes generating synthetic data for experiments (e.g., "Switching Borda or SB(t)", "Random-walk preferences or RW(ν)", "Lower Bound instance or LB(ϵ)") rather than using a publicly available dataset with concrete access information.
Dataset Splits No The paper describes setting up and running simulations in an online learning (bandit) environment. It does not mention or specify traditional training, validation, or test dataset splits (e.g., percentages, sample counts) as would be common in supervised learning tasks.
Hardware Specification No The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory, cloud instances) used for running the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) that would be needed to replicate the experiments.
Experiment Setup Yes Algorithm 1 Dueling-EXP3 (D-EXP3) Input: Item set indexed by [K], learning rate η > 0, parameters γ (0, 1) Initialize: Initial probability distribution q1(i) = 1/K, i [K] Let η = ((log K)/(T K))2/3 and γ = ηK. For any T, the expected regret of Algorithm 1 satisfies E[RT ] 6(K log K)1/3T 2/3. The explicit values used for τ, ν, ϵ are specified in the corresponding figures.