Fighting Bandits with a New Kind of Smoothness

Authors: Jacob D. Abernethy, Chansoo Lee, Ambuj Tewari

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We provide a new analysis framework for the adversarial multi-armed bandit problem. Using the notion of convex smoothing, we define a novel family of algorithms with minimax optimal regret guarantees. First, we show that regularization via the Tsallis entropy, which includes EXP3 as a special case, matches the O( NT) minimax regret with a smaller constant factor. Second, we show that a wide class of perturbation methods achieve a near-optimal regret as low as O(p NT log N), as long as the perturbation distribution has a bounded hazard function. For example, the Gumbel, Weibull, Frechet, Pareto, and Gamma distributions all satisfy this key property and lead to near-optimal algorithms.
Researcher Affiliation Academia Jacob Abernethy University of Michigan jabernet@umich.edu Chansoo Lee University of Michigan chansool@umich.edu Ambuj Tewari University of Michigan tewaria@umich.edu
Pseudocode Yes Framework 1: Gradient-Based Prediction Alg. (GBPA) Template for Multi-Armed Bandit GBPA( Φ): Φ is a differentiable convex function such that r Φ 2 N and ri Φ > 0 for all i. Initialize ˆG0 = 0 for t = 1 to T do Nature: A loss vector gt 2 [ 1, 0]N is chosen by the Adversary Sampling: Learner chooses it according to the distribution p( ˆGt 1) = rΦt( ˆGt 1) Cost: Learner gains loss gt,it Estimation: Learner guesses ˆgt := gt,it pit( ˆ Gt 1)eit Update: ˆGt = ˆGt 1 + ˆgt
Open Source Code No The paper does not provide any concrete access information (e.g., a specific repository link, an explicit code release statement, or mention of code in supplementary materials) for the described methodology.
Open Datasets No The paper is theoretical and does not involve experimental evaluation on datasets. Therefore, no information regarding publicly available datasets for training was found.
Dataset Splits No The paper is theoretical and does not describe empirical experiments with datasets, thus no information on training, validation, or test dataset splits is provided.
Hardware Specification No The paper is theoretical and does not describe empirical experiments. Therefore, no hardware specifications were mentioned.
Software Dependencies No The paper is theoretical and does not specify any software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe an experimental setup, hyperparameters, or training configurations.