Reducing Dueling Bandits to Cardinal Bandits

Authors: Nir Ailon, Zohar Karnin, Thorsten Joachims

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental For Doubler and Multi SBM we prove regret upper bounds in both finite and infinite settings, and conjecture about the performance of Sparring which empirically outperforms the other two as well as previous algorithms in our experiments. In addition, we provide the first almost optimal regret bound in terms of second order terms, such as the differences between the values of the arms.
Researcher Affiliation Collaboration Nir Ailon NAILON@TECHNION.AC.IL Technion, Dept. of Computer Science, Haifa 32000, Israel Zohar Karnin ZKARNIN@YMAIL.COM Yahoo Labs, Haifa 31905, Israel Thorsten Joachims TJ@CS.CORNELL.EDU Cornell University, Dept. of Computer Science, Ithaca, NY 14850, USA
Pseudocode Yes Algorithm 1 UCB algorithm for MAB with |X| = K arms.; Algorithm 2 (Doubler): Reduction for finite and infinite X with internal structure.; Algorithm 3 (Multi SBM): Reduction for unstructured finite X by using K SBMs in parallel.; Algorithm 4 (Sparring): Reduction to two SBMs.
Open Source Code No The paper does not provide any statement or link regarding the availability of its source code.
Open Datasets Yes For this comparison, we shall use the same matrix (ϵxy)x,y X as in (Yue & Joachims, 2011), which was empirically estimated from an operational search engine.
Dataset Splits No The paper describes a simulation-based experiment and does not mention explicit train/validation/test splits with percentages, sample counts, or predefined split citations.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers, such as programming languages or libraries, used for its experiments.
Experiment Setup Yes Henceforth, the set X of arms is {A, B, C, D, E, F}. We considered 5 choices of the expected value function µ( ) and 3 link functions... For each 15 combinations of arm values and link function we ran all 5 algorithms IF, BTMB, Multi SBM, Doubler, and Sparring with random inputs spanning a time horizon of up to 32000... averaged over 400 executions, with one standard deviation confidence bars.