Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Online EXP3 Learning in Adversarial Bandits with Delayed Feedback

Authors: Ilai Bistritz, Zhengyuan Zhou, Xi Chen, Nicholas Bambos, Jose Blanchet

NeurIPS 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We prove that the EXP3 algorithm (that uses the delayed feedback upon its ar-rival) achieves a regret of O r ln K KT + PT t=1 dt . For the case where PT t=1 dt and T are unknown, we propose a novel doubling trick for online learning with delays and prove that this adaptive EXP3 achieves a regret of ln K K2T + PT t=1 dt . We then consider a two player zero-sum game where players experience asynchronous delays. We show that even when the delays are large enough such that players no longer enjoy the no-regret property , (e.g., where dt = O (t log t)) the ergodic average of the strategy proﬁle still converges to the set of Nash equilibria of the game.
Researcher Affiliation	Collaboration	Ilai Bistritz1, Zhengyuan Zhou23, Xi Chen2, Nicholas Bambos1, Jose Blanchet1 1Stanford University 2New York University, Stern School of Business 3IBM Research EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 EXP3 with delays ... Algorithm 2 Adaptive EXP3 with delays for unknown T and PT t=1 dt
Open Source Code	No	The paper does not provide any concrete access to source code, nor does it state that code for the described methodology is released or available.
Open Datasets	No	As a theoretical paper, no datasets are used for training or evaluation, and thus no access information for a public dataset is provided.
Dataset Splits	No	The paper is theoretical and does not involve empirical validation or dataset splits.
Hardware Specification	No	The paper is theoretical and does not describe any specific hardware used for experiments.
Software Dependencies	No	The paper is theoretical and does not mention any software dependencies with specific version numbers.
Experiment Setup	No	The paper is theoretical and does not describe any experimental setup details, hyperparameters, or training configurations.