Online EXP3 Learning in Adversarial Bandits with Delayed Feedback

Authors: Ilai Bistritz, Zhengyuan Zhou, Xi Chen, Nicholas Bambos, Jose Blanchet

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We prove that the EXP3 algorithm (that uses the delayed feedback upon its ar-rival) achieves a regret of O r ln K KT + PT t=1 dt . For the case where PT t=1 dt and T are unknown, we propose a novel doubling trick for online learning with delays and prove that this adaptive EXP3 achieves a regret of ln K K2T + PT t=1 dt . We then consider a two player zero-sum game where players experience asynchronous delays. We show that even when the delays are large enough such that players no longer enjoy the no-regret property , (e.g., where dt = O (t log t)) the ergodic average of the strategy profile still converges to the set of Nash equilibria of the game.
Researcher Affiliation Collaboration Ilai Bistritz1, Zhengyuan Zhou23, Xi Chen2, Nicholas Bambos1, Jose Blanchet1 1Stanford University 2New York University, Stern School of Business 3IBM Research {bistritz,bambos,jose.blanchet}@stanford.edu, {zzhou,xchen3}@stern.nyu.edu
Pseudocode Yes Algorithm 1 EXP3 with delays ... Algorithm 2 Adaptive EXP3 with delays for unknown T and PT t=1 dt
Open Source Code No The paper does not provide any concrete access to source code, nor does it state that code for the described methodology is released or available.
Open Datasets No As a theoretical paper, no datasets are used for training or evaluation, and thus no access information for a public dataset is provided.
Dataset Splits No The paper is theoretical and does not involve empirical validation or dataset splits.
Hardware Specification No The paper is theoretical and does not describe any specific hardware used for experiments.
Software Dependencies No The paper is theoretical and does not mention any software dependencies with specific version numbers.
Experiment Setup No The paper is theoretical and does not describe any experimental setup details, hyperparameters, or training configurations.