Simultaneously Learning Stochastic and Adversarial Bandits with General Graph Feedback
Authors: Fang Kong, Yichi Zhou, Shuai Li
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We prove the proposed algorithm simultaneously achieves poly log T regret in the stochastic setting and minimax-optimal regret of O(T 2/3) in the adversarial setting where T is the horizon and O hides parameters independent of T as well as logarithmic terms. To our knowledge, this is the first best-of-both-worlds result for general feedback graphs. |
| Researcher Affiliation | Collaboration | 1John Hopcroft Center for Computer Science, Shanghai Jiao Tong University, Shanghai, China 2Microsoft Research Asia, Beijing, China. |
| Pseudocode | Yes | Algorithm 1 Bo BW with General Graph Feedback |
| Open Source Code | No | The paper does not include any statement or link indicating that open-source code for the methodology is provided. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments on datasets, thus it does not provide concrete access information for a publicly available or open dataset. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical data splits. Therefore, no information on training/validation/test splits is provided. |
| Hardware Specification | No | The paper is theoretical and does not report on experiments, thus no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not report on experimental software or dependencies, so no software names with version numbers are provided. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm design and proofs, rather than empirical experimentation. Therefore, no experimental setup details such as hyperparameters or training configurations are provided. |