Multi-armed Bandits: Competing with Optimal Sequences

Authors: Zohar S. Karnin, Oren Anava

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Our main result is a regret guarantee that scales with the variation parameter of the environment, without requiring any prior knowledge about it whatsoever. By that, we also resolve an open problem posted by Gur, Zeevi and Besbes [8]. An important key component in our result is a statistical test for identifying non-stationarity in a sequence of independent random variables. This test either identifies nonstationarity or upper-bounds the absolute deviation of the corresponding sequence of mean values in terms of its total variation. This test is interesting on its own right and has the potential to be found useful in additional settings.
Researcher Affiliation Industry Oren Anava The Voleon Group Berkeley, CA oren@voleon.com Zohar Karnin Yahoo! Research New York, NY zkarnin@yahoo-inc.com
Pseudocode Yes Our algorithm is formally given in Algorithm 1, and the working scheme is visually presented in Figure 2.
Open Source Code No The paper does not provide any statement about releasing source code or a link to a code repository.
Open Datasets No The paper is theoretical and does not describe experiments performed on any specific, publicly available dataset.
Dataset Splits No The paper is theoretical and does not describe experimental validation with dataset splits.
Hardware Specification No The paper is theoretical and does not specify any hardware used for experiments.
Software Dependencies No The paper is theoretical and does not mention any specific software dependencies or versions.
Experiment Setup No The paper is theoretical and does not describe any experimental setup details such as hyperparameters or training configurations.