Online EXP3 Learning in Adversarial Bandits with Delayed Feedback
Authors: Ilai Bistritz, Zhengyuan Zhou, Xi Chen, Nicholas Bambos, Jose Blanchet
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We prove that the EXP3 algorithm (that uses the delayed feedback upon its ar-rival) achieves a regret of O r ln K KT + PT t=1 dt . For the case where PT t=1 dt and T are unknown, we propose a novel doubling trick for online learning with delays and prove that this adaptive EXP3 achieves a regret of ln K K2T + PT t=1 dt . We then consider a two player zero-sum game where players experience asynchronous delays. We show that even when the delays are large enough such that players no longer enjoy the no-regret property , (e.g., where dt = O (t log t)) the ergodic average of the strategy proļ¬le still converges to the set of Nash equilibria of the game. |
| Researcher Affiliation | Collaboration | Ilai Bistritz1, Zhengyuan Zhou23, Xi Chen2, Nicholas Bambos1, Jose Blanchet1 1Stanford University 2New York University, Stern School of Business 3IBM Research {bistritz,bambos,jose.blanchet}@stanford.edu, {zzhou,xchen3}@stern.nyu.edu |
| Pseudocode | Yes | Algorithm 1 EXP3 with delays ... Algorithm 2 Adaptive EXP3 with delays for unknown T and PT t=1 dt |
| Open Source Code | No | The paper does not provide any concrete access to source code, nor does it state that code for the described methodology is released or available. |
| Open Datasets | No | As a theoretical paper, no datasets are used for training or evaluation, and thus no access information for a public dataset is provided. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical validation or dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not mention any software dependencies with specific version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe any experimental setup details, hyperparameters, or training configurations. |