Fighting Bandits with a New Kind of Smoothness
Authors: Jacob D. Abernethy, Chansoo Lee, Ambuj Tewari
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We provide a new analysis framework for the adversarial multi-armed bandit problem. Using the notion of convex smoothing, we define a novel family of algorithms with minimax optimal regret guarantees. First, we show that regularization via the Tsallis entropy, which includes EXP3 as a special case, matches the O( NT) minimax regret with a smaller constant factor. Second, we show that a wide class of perturbation methods achieve a near-optimal regret as low as O(p NT log N), as long as the perturbation distribution has a bounded hazard function. For example, the Gumbel, Weibull, Frechet, Pareto, and Gamma distributions all satisfy this key property and lead to near-optimal algorithms. |
| Researcher Affiliation | Academia | Jacob Abernethy University of Michigan jabernet@umich.edu Chansoo Lee University of Michigan chansool@umich.edu Ambuj Tewari University of Michigan tewaria@umich.edu |
| Pseudocode | Yes | Framework 1: Gradient-Based Prediction Alg. (GBPA) Template for Multi-Armed Bandit GBPA( Φ): Φ is a differentiable convex function such that r Φ 2 N and ri Φ > 0 for all i. Initialize ˆG0 = 0 for t = 1 to T do Nature: A loss vector gt 2 [ 1, 0]N is chosen by the Adversary Sampling: Learner chooses it according to the distribution p( ˆGt 1) = rΦt( ˆGt 1) Cost: Learner gains loss gt,it Estimation: Learner guesses ˆgt := gt,it pit( ˆ Gt 1)eit Update: ˆGt = ˆGt 1 + ˆgt |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., a specific repository link, an explicit code release statement, or mention of code in supplementary materials) for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not involve experimental evaluation on datasets. Therefore, no information regarding publicly available datasets for training was found. |
| Dataset Splits | No | The paper is theoretical and does not describe empirical experiments with datasets, thus no information on training, validation, or test dataset splits is provided. |
| Hardware Specification | No | The paper is theoretical and does not describe empirical experiments. Therefore, no hardware specifications were mentioned. |
| Software Dependencies | No | The paper is theoretical and does not specify any software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup, hyperparameters, or training configurations. |