Batched Dueling Bandits
Authors: Arpit Agarwal, Rohan Ghuge, Viswanath Nagarajan
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we also validate our theoretical results via experiments on synthetic and real data. |
| Researcher Affiliation | Academia | Arpit Argarwal 1 Rohan Ghuge 2 Viswanath Nagarajan 2 1Data Science Institute, Columbia University, New York, USA 2Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, USA. |
| Pseudocode | Yes | Algorithm 1 PCOMP(ALL PAIRS COMPARISONS) [...] Algorithm 2 SCOMP(SEEDED COMPARISONS) [...] Algorithm 3 SCOMP2 (SEEDED COMPARISONS 2) |
| Open Source Code | No | The paper mentions using a 'dueling bandit library due to (Komiyama et al., 2015)' for comparison but does not state that the authors' own code or implementations are open-source or provide a link. |
| Open Datasets | Yes | Sushi. The Sushi dataset is based on the Sushi preference dataset (Kamishima, 2003) that contains the preference data regarding 100 types of Sushi. |
| Dataset Splits | No | The paper describes an online learning problem (dueling bandits) evaluated on cumulative regret over a time horizon. It does not mention traditional train/validation/test splits of a static dataset for model tuning or evaluation. |
| Hardware Specification | Yes | We conducted our computations using C++ and Python 2.7 with a 2.3 Ghz Intel Core i5 processor and 16 GB 2133 MHz LPDDR3 memory. |
| Software Dependencies | No | The paper mentions 'C++ and Python 2.7' and 'the dueling bandit library due to (Komiyama et al., 2015)' but does not provide specific version numbers for C++ compilers or the mentioned library, only for Python itself. |
| Experiment Setup | Yes | We set T = 10^5, δ = 1/TK^2 and B = log(T) = 16. We set α = 0.51 for RUCB, and f(K) = 0.3K^1.01 for RMED1, and γ = 1.3 for BTM. |