Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits

Authors: Jiatai Huang, Yan Dai, Longbo Huang

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we generalize the concept of heavytailed multi-armed bandits to adversarial environments, and develop robust best-of-both-worlds algorithms for heavy-tailed multi-armed bandits (MAB), where losses have α-th (1 < α 2) moments bounded by σα, while the variances may not exist. Speciﬁcally, we design an algorithm HTINF, when the heavy-tail parameters α and σ are known to the agent, HTINF simultaneously achieves the optimal regret for both stochastic and adversarial environments, without knowing the actual environment type a-priori. When α, σ are unknown, HTINF achieves a log T-style instancedependent regret in stochastic cases and o(T) noregret guarantee in adversarial cases. We further develop an algorithm Ada TINF, achieving O(σK1 1/αT 1/α) minimax optimal regret even in adversarial settings, without prior knowledge on α and σ. This result matches the known regret lower-bound (Bubeck et al., 2013), which assumed a stochastic environment and α and σ are both known. To our knowledge, the proposed HTINF algorithm is the ﬁrst to enjoy a best-ofboth-worlds regret guarantee, and Ada TINF is the ﬁrst algorithm that can adapt to both α and σ to achieve optimal gap-indepedent regret bound in classical heavy-tailed stochastic MAB setting and our novel adversarial formulation.
Researcher Affiliation	Academia	Jiatai Huang * 1 Yan Dai * 1 Longbo Huang 1 1Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China.
Pseudocode	Yes	Algorithm 1 Heavy-Tail Tsallis-INF (HTINF) Algorithm 2 Optimistic HTINF (Opt TINF) Algorithm 3 Adaptive Tsallis-INF (Ada TINF)
Open Source Code	No	The paper is a theoretical work focusing on algorithm design and analysis, and it does not mention the release of any source code or provide links to a code repository.
Open Datasets	No	The paper focuses on theoretical algorithm design and regret analysis, not on empirical training using specific datasets. Therefore, no information about publicly available training datasets is provided.
Dataset Splits	No	The paper is theoretical and analyzes algorithms. It does not involve empirical experiments with dataset splits for training, validation, or testing.
Hardware Specification	No	The paper is purely theoretical, focusing on algorithm design and analysis. It does not describe any experimental setup or specify hardware used.
Software Dependencies	No	The paper is theoretical and focuses on algorithm design and analysis. It does not mention any software dependencies or specific version numbers relevant to experimental replication.
Experiment Setup	No	The paper describes the theoretical setup of multi-armed bandit problems and the properties of the proposed algorithms but does not provide details of an empirical experiment setup, such as hyperparameters or system-level training settings.