Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Online EXP3 Learning in Adversarial Bandits with Delayed Feedback
Authors: Ilai Bistritz, Zhengyuan Zhou, Xi Chen, Nicholas Bambos, Jose Blanchet
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We prove that the EXP3 algorithm (that uses the delayed feedback upon its ar-rival) achieves a regret of O r ln K KT + PT t=1 dt . For the case where PT t=1 dt and T are unknown, we propose a novel doubling trick for online learning with delays and prove that this adaptive EXP3 achieves a regret of ln K K2T + PT t=1 dt . We then consider a two player zero-sum game where players experience asynchronous delays. We show that even when the delays are large enough such that players no longer enjoy the no-regret property , (e.g., where dt = O (t log t)) the ergodic average of the strategy proο¬le still converges to the set of Nash equilibria of the game. |
| Researcher Affiliation | Collaboration | Ilai Bistritz1, Zhengyuan Zhou23, Xi Chen2, Nicholas Bambos1, Jose Blanchet1 1Stanford University 2New York University, Stern School of Business 3IBM Research EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 EXP3 with delays ... Algorithm 2 Adaptive EXP3 with delays for unknown T and PT t=1 dt |
| Open Source Code | No | The paper does not provide any concrete access to source code, nor does it state that code for the described methodology is released or available. |
| Open Datasets | No | As a theoretical paper, no datasets are used for training or evaluation, and thus no access information for a public dataset is provided. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical validation or dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not mention any software dependencies with specific version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe any experimental setup details, hyperparameters, or training configurations. |