An Asymptotically Optimal Batched Algorithm for the Dueling Bandit Problem
Authors: Arpit Agarwal, Rohan Ghuge, viswanath nagarajan
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we run computational experiments to validate our theoretical results. We show that C2B, using O(log T) batches, achieves almost the same performance as fully sequential algorithms (which effectively use T batches) over a variety of real datasets. |
| Researcher Affiliation | Academia | Arpit Agarwal Columbia University arpit.agarwal@columbia.edu Rohan Ghuge University of Michigan rghuge@umich.edu Viswanath Nagarajan University of Michigan viswa@umich.edu |
| Pseudocode | Yes | Algorithm 1 C2B (CATCHING THE CONDORCET WINNER IN BATCHES) |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experi- mental results (either in the supplemental material or as a URL)? [Yes] We include the code in the supplemental material. |
| Open Datasets | Yes | We perform these experiments using the following real-world datasets. Six rankers. This dataset is based on the 6 retrieval functions used in the engine of Ar Xiv.org. Sushi. The Sushi dataset is based on the Sushi preference dataset [34] that contains the preference data regarding 100 types of Sushi. ... The Irish election data for Dublin and Meath is available at preflib.org. ... the Microsoft Learning to Rank (MSLR) dataset [40] and the Yahoo! Learning to Rank Challenge Set 1 [16]. |
| Dataset Splits | No | The paper sets a time horizon T for the experiments and discusses parameters like ' = 0.51 for RUCB, and f(K) = 0.3K1.01 for RMED and C2B, and = 1.3 for BTM'. However, it does not provide explicit details about data splits (e.g., train/validation/test percentages or counts) for reproducibility, which is common in many machine learning contexts. |
| Hardware Specification | No | The provided text does not include specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running the experiments. While the checklist indicates such details are in the appendix, they are not present in the given document excerpt. |
| Software Dependencies | No | The paper mentions using a 'KL-divergence based confidence bound due to [35]' and 'the library due to [35]' but does not provide specific software names with version numbers (e.g., Python 3.x, TensorFlow x.x, PyTorch x.x) for its implementation or experiments. |
| Experiment Setup | Yes | We set = 0.51 for RUCB, and f(K) = 0.3K1.01 for RMED and C2B, and = 1.3 for BTM: these parameters are known to perform well both theoretically and empirically [35]. We set T = 106 for MSLR30 and Yahoo30 datasets (as they have larger number of arms), and T = 105 for the remaining four. |