Threshold Bandits, With and Without Censored Feedback
Authors: Jacob D. Abernethy, Kareem Amin, Ruihao Zhu
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Using new tools to understand the popular UCB algorithm, we show that the uncensored case is essentially no more difficult than the classical multi-armed bandit setting. Finally we show that the censored case exhibits more challenges, but we give guarantees in the event that the sequence of threshold values is generated optimistically. First, we provide a new perspective on the classical UCB algorithm, giving an alternative proof that relies on an interesting potential function argument; we believe this technique may be of independent interest. Second, we analyze the Threshold Bandit setting when given uncensored feedback, and we give a novel algorithm called DKWUCB based on the Dvoretzky-Kiefer-Wolfowitz inequality (Dvoretzky et al., 1956). We show, somewhat surprisingly, that with uncensored feedback the regret bound is no worse than the standard stochastic MAB setting, suggesting that despite the much richer policy class one has nearly the same learning challenge. Finally, we consider learning in the censored feedback setting, and propose an algorithm KMUCB, akin to the Kaplan-Meier estimator (Kaplan and Meier, 1958). |
| Researcher Affiliation | Academia | Jacob Abernethy Department of Computer Science University of Michigan Ann Arbor, MI 48109 jabernet@umich.edu Kareem Amin Department of Computer Science University of Michigan Ann Arbor, MI 48109 amkareem@umich.edu Ruihao Zhu Aero Astro&CSAIL MIT Cambridge, MA 02139 rzhu@mit.edu |
| Pseudocode | Yes | Algorithm 1 KMUCB |
| Open Source Code | No | No statement or link providing access to open-source code was found. |
| Open Datasets | No | This paper is theoretical and does not use a specific, publicly available dataset. It discusses abstract distributions and samples for theoretical analysis. |
| Dataset Splits | No | This paper is theoretical and does not involve experimental validation on datasets with explicit train/validation/test splits. |
| Hardware Specification | No | No specific hardware used for experiments is mentioned in the paper, as it focuses on theoretical contributions. |
| Software Dependencies | No | No specific software dependencies with version numbers are mentioned, as the paper is primarily theoretical. |
| Experiment Setup | No | No experimental setup details, such as hyperparameters or system-level training settings, are provided, as the paper focuses on theoretical analysis rather than empirical experiments. |