Reducing Dueling Bandits to Cardinal Bandits
Authors: Nir Ailon, Zohar Karnin, Thorsten Joachims
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | For Doubler and Multi SBM we prove regret upper bounds in both finite and infinite settings, and conjecture about the performance of Sparring which empirically outperforms the other two as well as previous algorithms in our experiments. In addition, we provide the first almost optimal regret bound in terms of second order terms, such as the differences between the values of the arms. |
| Researcher Affiliation | Collaboration | Nir Ailon NAILON@TECHNION.AC.IL Technion, Dept. of Computer Science, Haifa 32000, Israel Zohar Karnin ZKARNIN@YMAIL.COM Yahoo Labs, Haifa 31905, Israel Thorsten Joachims TJ@CS.CORNELL.EDU Cornell University, Dept. of Computer Science, Ithaca, NY 14850, USA |
| Pseudocode | Yes | Algorithm 1 UCB algorithm for MAB with |X| = K arms.; Algorithm 2 (Doubler): Reduction for finite and infinite X with internal structure.; Algorithm 3 (Multi SBM): Reduction for unstructured finite X by using K SBMs in parallel.; Algorithm 4 (Sparring): Reduction to two SBMs. |
| Open Source Code | No | The paper does not provide any statement or link regarding the availability of its source code. |
| Open Datasets | Yes | For this comparison, we shall use the same matrix (ϵxy)x,y X as in (Yue & Joachims, 2011), which was empirically estimated from an operational search engine. |
| Dataset Splits | No | The paper describes a simulation-based experiment and does not mention explicit train/validation/test splits with percentages, sample counts, or predefined split citations. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers, such as programming languages or libraries, used for its experiments. |
| Experiment Setup | Yes | Henceforth, the set X of arms is {A, B, C, D, E, F}. We considered 5 choices of the expected value function µ( ) and 3 link functions... For each 15 combinations of arm values and link function we ran all 5 algorithms IF, BTMB, Multi SBM, Doubler, and Sparring with random inputs spanning a time horizon of up to 32000... averaged over 400 executions, with one standard deviation confidence bars. |