Simultaneously Learning Stochastic and Adversarial Bandits with General Graph Feedback

Authors: Fang Kong, Yichi Zhou, Shuai Li

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We prove the proposed algorithm simultaneously achieves poly log T regret in the stochastic setting and minimax-optimal regret of O(T 2/3) in the adversarial setting where T is the horizon and O hides parameters independent of T as well as logarithmic terms. To our knowledge, this is the first best-of-both-worlds result for general feedback graphs.
Researcher Affiliation Collaboration 1John Hopcroft Center for Computer Science, Shanghai Jiao Tong University, Shanghai, China 2Microsoft Research Asia, Beijing, China.
Pseudocode Yes Algorithm 1 Bo BW with General Graph Feedback
Open Source Code No The paper does not include any statement or link indicating that open-source code for the methodology is provided.
Open Datasets No The paper is theoretical and does not conduct experiments on datasets, thus it does not provide concrete access information for a publicly available or open dataset.
Dataset Splits No The paper is theoretical and does not involve empirical data splits. Therefore, no information on training/validation/test splits is provided.
Hardware Specification No The paper is theoretical and does not report on experiments, thus no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and does not report on experimental software or dependencies, so no software names with version numbers are provided.
Experiment Setup No The paper is theoretical and focuses on algorithm design and proofs, rather than empirical experimentation. Therefore, no experimental setup details such as hyperparameters or training configurations are provided.