Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
No Weighted-Regret Learning in Adversarial Bandits with Delays
Authors: Ilai Bistritz, Zhengyuan Zhou, Xi Chen, Nicholas Bambos, Jose Blanchet
JMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We prove that the FKM algorithm with n dimensions achieves an expected regret of O n T 3 4 + n T 1 3 D 1 3 and the EXP3 algorithm with K arms achieves an expected regret of O p log K (KT + D) even when D = PT t=1 dt and T are unknown. These bounds use a novel doubling trick that, under mild assumptions, provably retains the regret bound for when D and T are known. Using these bounds, we show that FKM and EXP3 have no weighted-regret even for dt = O (t log t). Therefore, algorithms with no weighted-regret can be used to approximate a CCE of a finite or convex unknown game that can only be simulated with bandit feedback, even if the simulation involves significant delays. Our first main contribution in this paper is to show that even with delays that cause a linear regret, the expected weighted ergodic distribution may still converge to the set of CCE, and the weighted ergodic average may still converge (in L1) to the set of NE for a two-player zero-sum game. |
| Researcher Affiliation | Academia | Ilai Bistritz EMAIL Department of Electrical Engineering Stanford University Stanford, CA 94305, USA Zhengyuan Zhou EMAIL Stern School of Business New York University New York, NY 10003, USA Xi Chen EMAIL Stern School of Business New York University New York, NY 10003, USA Nicholas Bambos EMAIL Department of Electrical Engineering Stanford University Stanford, CA 94305, USA Jose Blanchet EMAIL Department of Management Science & Engineering Stanford University Stanford, CA 94305, USA |
| Pseudocode | Yes | Algorithm 1 2D Doubling Trick Wrapper for Unknown T and D; Algorithm 2 FKM with delays; Algorithm 3 EXP3 with delays |
| Open Source Code | No | No explicit statement about code release or links to code repositories are found. The paper's license and attribution link refer to the paper itself, not source code for the methodology. |
| Open Datasets | No | The paper is theoretical and analyzes algorithms with abstract cost functions lt(at). It does not mention using any specific dataset or provide information on dataset access. |
| Dataset Splits | No | The paper does not conduct experiments on datasets, therefore no dataset splits are mentioned. |
| Hardware Specification | No | The paper is theoretical and does not describe any experiments that would require specific hardware. |
| Software Dependencies | No | The paper is theoretical and does not mention any specific software, libraries, or solvers with version numbers used for implementation or experimentation. |
| Experiment Setup | No | The paper does not describe any experiments; therefore, no experimental setup details or hyperparameters are provided. |