Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits

Authors: Julian Zimmert, Yevgeny Seldin

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide empirical evaluation of the algorithm demonstrating that it signiﬁcantly outperforms Ucb1 and Exp3 in stochastic environments. We also provide examples of adversarial environments, where Ucb1 and Thompson Sampling exhibit almost linear regret, whereas our algorithm suﬀers only logarithmic regret. To the best of our knowledge, this is the ﬁrst example demonstrating vulnerability of Thompson Sampling in adversarial environments. Last but not least, we present a general stochastic analysis and a general adversarial analysis of OMD algorithms with Tsallis entropy regularization for α [0, 1] and explain the reason why α = 1/2 works best.
Researcher Affiliation	Academia	Julian Zimmert EMAIL Yevgeny Seldin EMAIL University of Copenhagen, Copenhagen, Denmark
Pseudocode	Yes	Algorithm 1: Online Mirror Descent for bandits... Algorithm 2: Newton s Method approximation of wt in Tsallis-Inf (α = 1
Open Source Code	No	The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository. Phrases like 'we release our code' or direct links are absent.
Open Datasets	No	The paper describes experiments conducted in simulated 'stochastic MAB' and 'stochastically constrained adversarial' environments, where parameters like mean rewards and gaps are defined and varied. It does not mention using any pre-existing publicly available datasets, nor does it provide access information for any generated data.
Dataset Splits	No	The paper describes simulated environments rather than using pre-existing datasets with defined splits. It mentions '100 repetitions' for experiments and varying parameters (K, gaps), but not dataset splits like training, validation, or test sets.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud computing specifications used for running the experiments.
Software Dependencies	No	The paper mentions several algorithms (Ucb1, Thompson Sampling, Exp3, EXP3++, Broad) as baselines and their respective authors/papers, but does not list specific software or library names with version numbers that were used for implementation or experimentation.
Experiment Setup	Yes	The ﬁrst experiment, shown in Figures 1 and 2, is a standard stochastic MAB, where the mean rewards are (1 + ε)/2 for the single optimal arm and (1 - ε)/2 for all the suboptimal arms. The number of arms K and the gaps ε are varied as described in the ﬁgures. ...The second experiment, shown in Figures 3 and 4, simulates stochastically constrained adversaries. The mean loss of (optimal arm, all sub-optimal arms) switches between (1-ε, 1) and (0, ε), while staying unchanged for phases that are increasing exponentially in length. ...The pseudo-regret is estimated by 100 repetitions of the corresponding experiments and two standard deviations of the empirical pseudo-regret... Ucb1 (Auer et al., 2002a, with parameter α = 1.5) and Thompson Sampling (Thompson, 1933)... EXP3++ with parametrization proposed by Seldin and Lugosi (2017) and Broad (Wei and Luo, 2018).