Taming Heavy-Tailed Losses in Adversarial Bandits and the Best-of-Both-Worlds Setting

Authors: Duo Cheng, Xingyu Zhou, Bo Ji

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we study the multi-armed bandits problem in the best-of-both-worlds (BOBW) setting with heavy-tailed losses, where the losses can be negative and unbounded but have (1 + v)-th raw moments bounded by u1+v for some known u > 0 and v (0, 1]. We propose an algorithm and prove that it achieves a T 1 1+v -type worst-case (pseudo-)regret in the adversarial regime and a log T-type gap-dependent regret in the stochastic regime, where T is the time horizon.
Researcher Affiliation Academia Duo Cheng Virginia Tech duocheng@vt.edu Xingyu Zhou Wayne State University xingyu.zhou@wayne.edu Bo Ji Virginia Tech boji@vt.edu
Pseudocode Yes Algorithm 1 OMD with log-barrier and increasing learning rates for heavy-tailed MABs (OMD-LB-HT) ... Algorithm 2 SAO for heavy-tailed MABs (SAO-HT)
Open Source Code No The paper does not contain any statement about releasing open-source code for the described methodology or a link to a repository.
Open Datasets No The paper is theoretical and does not describe experiments using a dataset. It refers to 'real-world data from application domains such as finance (Cont, 2001) and imaging (Hamza & Krim, 2001)' as examples of heavy-tailed distributions, but these are not used as datasets for empirical evaluation.
Dataset Splits No The paper is theoretical and does not describe experiments with dataset splits.
Hardware Specification No The paper is theoretical and does not describe any experiments that would require specific hardware specifications.
Software Dependencies No The paper is theoretical and does not mention specific software dependencies with version numbers for experimental reproducibility.
Experiment Setup No The paper is theoretical and defines algorithm parameters (e.g., 'initial learning rate η', 'trimming threshold Mt,i') as part of the algorithm design, rather than specific experimental setup details such as hyperparameters for empirical runs.