reproducibilityindex.ai

Adversarial Dueling Bandits

Authors: Aadirupa Saha, Tomer Koren, Yishay Mansour

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we corroborate the theoretical results with empirical evaluations. In this section we evaluate the empirical evaluation of our proposed algorithm Dueling-EXP3 and compare its performances with the only other existing adversarial dueling bandit algorithm, REX3, although it is known to work only under the restricted class of linear utility based preferences (Gajane et al., 2015; Ailon et al., 2014).
Researcher Affiliation	Collaboration	1Microsoft Research, New York City 2Blavatnik School of Computer Science, Tel Aviv University, and Google Research Tel Aviv. Correspondence to: Aadirupa Saha <aadirupa.saha@microsoft.com>.
Pseudocode	Yes	Algorithm 1 Dueling-EXP3 (D-EXP3) Algorithm 2 Dueling-EXP3 (High Probability) Algorithm 3 Borda-Conﬁdence-Bound (BCB)
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described.
Open Datasets	No	The paper describes generating synthetic data for experiments (e.g., "Switching Borda or SB(t)", "Random-walk preferences or RW(ν)", "Lower Bound instance or LB(ϵ)") rather than using a publicly available dataset with concrete access information.
Dataset Splits	No	The paper describes setting up and running simulations in an online learning (bandit) environment. It does not mention or specify traditional training, validation, or test dataset splits (e.g., percentages, sample counts) as would be common in supervised learning tasks.
Hardware Specification	No	The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory, cloud instances) used for running the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) that would be needed to replicate the experiments.
Experiment Setup	Yes	Algorithm 1 Dueling-EXP3 (D-EXP3) Input: Item set indexed by [K], learning rate η > 0, parameters γ (0, 1) Initialize: Initial probability distribution q1(i) = 1/K, i [K] Let η = ((log K)/(T K))2/3 and γ = ηK. For any T, the expected regret of Algorithm 1 satisﬁes E[RT ] 6(K log K)1/3T 2/3. The explicit values used for τ, ν, ϵ are speciﬁed in the corresponding ﬁgures.