Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits

Authors: Jiatai Huang, Yan Dai, Longbo Huang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we generalize the concept of heavytailed multi-armed bandits to adversarial environments, and develop robust best-of-both-worlds algorithms for heavy-tailed multi-armed bandits (MAB), where losses have α-th (1 < α 2) moments bounded by σα, while the variances may not exist. Specifically, we design an algorithm HTINF, when the heavy-tail parameters α and σ are known to the agent, HTINF simultaneously achieves the optimal regret for both stochastic and adversarial environments, without knowing the actual environment type a-priori. When α, σ are unknown, HTINF achieves a log T-style instancedependent regret in stochastic cases and o(T) noregret guarantee in adversarial cases. We further develop an algorithm Ada TINF, achieving O(σK1 1/αT 1/α) minimax optimal regret even in adversarial settings, without prior knowledge on α and σ. This result matches the known regret lower-bound (Bubeck et al., 2013), which assumed a stochastic environment and α and σ are both known. To our knowledge, the proposed HTINF algorithm is the first to enjoy a best-ofboth-worlds regret guarantee, and Ada TINF is the first algorithm that can adapt to both α and σ to achieve optimal gap-indepedent regret bound in classical heavy-tailed stochastic MAB setting and our novel adversarial formulation.
Researcher Affiliation Academia Jiatai Huang * 1 Yan Dai * 1 Longbo Huang 1 1Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China.
Pseudocode Yes Algorithm 1 Heavy-Tail Tsallis-INF (HTINF) Algorithm 2 Optimistic HTINF (Opt TINF) Algorithm 3 Adaptive Tsallis-INF (Ada TINF)
Open Source Code No The paper is a theoretical work focusing on algorithm design and analysis, and it does not mention the release of any source code or provide links to a code repository.
Open Datasets No The paper focuses on theoretical algorithm design and regret analysis, not on empirical training using specific datasets. Therefore, no information about publicly available training datasets is provided.
Dataset Splits No The paper is theoretical and analyzes algorithms. It does not involve empirical experiments with dataset splits for training, validation, or testing.
Hardware Specification No The paper is purely theoretical, focusing on algorithm design and analysis. It does not describe any experimental setup or specify hardware used.
Software Dependencies No The paper is theoretical and focuses on algorithm design and analysis. It does not mention any software dependencies or specific version numbers relevant to experimental replication.
Experiment Setup No The paper describes the theoretical setup of multi-armed bandit problems and the properties of the proposed algorithms but does not provide details of an empirical experiment setup, such as hyperparameters or system-level training settings.