Threshold Bandits, With and Without Censored Feedback

Authors: Jacob D. Abernethy, Kareem Amin, Ruihao Zhu

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Using new tools to understand the popular UCB algorithm, we show that the uncensored case is essentially no more difficult than the classical multi-armed bandit setting. Finally we show that the censored case exhibits more challenges, but we give guarantees in the event that the sequence of threshold values is generated optimistically. First, we provide a new perspective on the classical UCB algorithm, giving an alternative proof that relies on an interesting potential function argument; we believe this technique may be of independent interest. Second, we analyze the Threshold Bandit setting when given uncensored feedback, and we give a novel algorithm called DKWUCB based on the Dvoretzky-Kiefer-Wolfowitz inequality (Dvoretzky et al., 1956). We show, somewhat surprisingly, that with uncensored feedback the regret bound is no worse than the standard stochastic MAB setting, suggesting that despite the much richer policy class one has nearly the same learning challenge. Finally, we consider learning in the censored feedback setting, and propose an algorithm KMUCB, akin to the Kaplan-Meier estimator (Kaplan and Meier, 1958).
Researcher Affiliation Academia Jacob Abernethy Department of Computer Science University of Michigan Ann Arbor, MI 48109 jabernet@umich.edu Kareem Amin Department of Computer Science University of Michigan Ann Arbor, MI 48109 amkareem@umich.edu Ruihao Zhu Aero Astro&CSAIL MIT Cambridge, MA 02139 rzhu@mit.edu
Pseudocode Yes Algorithm 1 KMUCB
Open Source Code No No statement or link providing access to open-source code was found.
Open Datasets No This paper is theoretical and does not use a specific, publicly available dataset. It discusses abstract distributions and samples for theoretical analysis.
Dataset Splits No This paper is theoretical and does not involve experimental validation on datasets with explicit train/validation/test splits.
Hardware Specification No No specific hardware used for experiments is mentioned in the paper, as it focuses on theoretical contributions.
Software Dependencies No No specific software dependencies with version numbers are mentioned, as the paper is primarily theoretical.
Experiment Setup No No experimental setup details, such as hyperparameters or system-level training settings, are provided, as the paper focuses on theoretical analysis rather than empirical experiments.