Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits
Authors: Jiatai Huang, Yan Dai, Longbo Huang
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we generalize the concept of heavytailed multi-armed bandits to adversarial environments, and develop robust best-of-both-worlds algorithms for heavy-tailed multi-armed bandits (MAB), where losses have α-th (1 < α 2) moments bounded by σα, while the variances may not exist. Specifically, we design an algorithm HTINF, when the heavy-tail parameters α and σ are known to the agent, HTINF simultaneously achieves the optimal regret for both stochastic and adversarial environments, without knowing the actual environment type a-priori. When α, σ are unknown, HTINF achieves a log T-style instancedependent regret in stochastic cases and o(T) noregret guarantee in adversarial cases. We further develop an algorithm Ada TINF, achieving O(σK1 1/αT 1/α) minimax optimal regret even in adversarial settings, without prior knowledge on α and σ. This result matches the known regret lower-bound (Bubeck et al., 2013), which assumed a stochastic environment and α and σ are both known. To our knowledge, the proposed HTINF algorithm is the first to enjoy a best-ofboth-worlds regret guarantee, and Ada TINF is the first algorithm that can adapt to both α and σ to achieve optimal gap-indepedent regret bound in classical heavy-tailed stochastic MAB setting and our novel adversarial formulation. |
| Researcher Affiliation | Academia | Jiatai Huang * 1 Yan Dai * 1 Longbo Huang 1 1Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China. |
| Pseudocode | Yes | Algorithm 1 Heavy-Tail Tsallis-INF (HTINF) Algorithm 2 Optimistic HTINF (Opt TINF) Algorithm 3 Adaptive Tsallis-INF (Ada TINF) |
| Open Source Code | No | The paper is a theoretical work focusing on algorithm design and analysis, and it does not mention the release of any source code or provide links to a code repository. |
| Open Datasets | No | The paper focuses on theoretical algorithm design and regret analysis, not on empirical training using specific datasets. Therefore, no information about publicly available training datasets is provided. |
| Dataset Splits | No | The paper is theoretical and analyzes algorithms. It does not involve empirical experiments with dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is purely theoretical, focusing on algorithm design and analysis. It does not describe any experimental setup or specify hardware used. |
| Software Dependencies | No | The paper is theoretical and focuses on algorithm design and analysis. It does not mention any software dependencies or specific version numbers relevant to experimental replication. |
| Experiment Setup | No | The paper describes the theoretical setup of multi-armed bandit problems and the properties of the proposed algorithms but does not provide details of an empirical experiment setup, such as hyperparameters or system-level training settings. |