On the Minimax Regret for Online Learning with Feedback Graphs
Authors: Khaled Eldowa, Emmanuel Esposito, Tom Cesari, Nicolò Cesa-Bianchi
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we improve on the upper and lower bounds for the regret of online learning with strongly observable undirected feedback graphs. The best known upper bound for this problem is O αT ln K , where K is the number of actions, α is the independence number of the graph, and T is the time horizon. The ln K factor is known to be necessary when α = 1 (the experts case). On the other hand, when α = K (the bandits case), the minimax rate is known to be Θ KT , and a lower bound Ω αT is known to hold for any α. Our improved upper bound O p αT(1 + ln(K/α)) holds for any α and matches the lower bounds for bandits and experts, while interpolating intermediate cases. To prove this result, we use FTRL with q-Tsallis entropy for a carefully chosen value of q [1/2, 1) that varies with α. |
| Researcher Affiliation | Academia | Khaled Eldowa Università degli Studi di Milano, Milan, Italy khaled.eldowa@unimi.it Emmanuel Esposito Università degli Studi di Milano, Milan, Italy & Istituto Italiano di Tecnologia, Genoa, Italy emmanuel@emmanuelesposito.it Tommaso Cesari University of Ottawa, Ottawa, Canada tcesari@uottawa.ca Nicolò Cesa-Bianchi Università degli Studi di Milano, Milan, Italy & Politecnico di Milano, Milan, Italy nicolo.cesa-bianchi@unimi.it |
| Pseudocode | Yes | Algorithm 1 q-FTRL for undirected feedback graphs input: q (0, 1), η > 0 initialization: p1(i) 1/K for all i = 1, . . . , K for t = 1, . . . , T do Select action It pt and incur loss ℓt(It) Observe losses i, ℓt(i) : i Nt(It) and graph Gt Construct a loss estimate bℓt for ℓt e.g., (1) or (6) Let pt+1 arg minp K η Pt s=1 bℓs, p + ψq(p) end for |
| Open Source Code | No | The paper focuses on theoretical bounds and algorithm design. It does not contain any statement about making its source code publicly available or provide links to a code repository. |
| Open Datasets | No | The paper is theoretical and focuses on mathematical proofs and algorithm analysis. It does not describe experiments that use datasets for training or evaluation, nor does it provide any information about dataset availability. |
| Dataset Splits | No | This paper is theoretical and does not describe experiments involving datasets. Consequently, it does not provide details on training, validation, or test dataset splits. |
| Hardware Specification | No | This paper is theoretical and does not describe experimental setups or hardware specifications used for computations. |
| Software Dependencies | No | The paper is theoretical and does not specify any software dependencies with version numbers required for replication. |
| Experiment Setup | No | This paper is theoretical, focusing on mathematical derivations and algorithm analysis. It defines algorithm parameters like q and η as part of its theoretical framework but does not describe an empirical experimental setup with hyperparameters, training configurations, or system-level settings. |