Batched Coarse Ranking in Multi-Armed Bandits
Authors: Nikolai Karpov, Qin Zhang
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We introduce and study the problem of batched coarse ranking in multi-armed bandits. We propose two novel algorithms based on the principle of optimism in the face of uncertainty and Thompson Sampling, and provide theoretical guarantees for their worst-case query complexity. We present extensive numerical results demonstrating improved performance over existing methods on both synthetic and real-world datasets. |
| Researcher Affiliation | Collaboration | Andrew L. Liu (University of Toronto, Carnegie Mellon University), Zhiyao Lei (University of Toronto), Gauri Joshi (Carnegie Mellon University), Venkatesan Guruswami (Amazon Web Services, Carnegie Mellon University) |
| Pseudocode | Yes | Algorithm 1: BATCH-C-UCB (on page 5) Algorithm 2: BATCH-C-TS (on page 6) |
| Open Source Code | No | The paper does not provide any statements or links regarding the availability of open-source code for the described methodology. |
| Open Datasets | Yes | MovieLens 1M dataset [10] |
| Dataset Splits | Yes | We randomly partition the data into a training set (80%) and a testing set (20%). |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU models, CPU types, or cloud computing resources) used for running the experiments. |
| Software Dependencies | No | The paper does not list any specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or specific solvers). |
| Experiment Setup | Yes | For BATCH-C-UCB and BATCH-C-TS, we choose B = 10 batches and N = 200 total samples. For the sequential algorithms, we set batch size b = 1. We consider the top 20 arms (movies) as the elite set. We set the reward threshold µ0 = 3.5, the batch size for BATCH-C-UCB and BATCH-C-TS as B = 10, the total number of samples N = 200, and the confidence parameter δ = 0.05. |