reproducibilityindex.ai

Learning in Generalized Linear Contextual Bandits with Stochastic Delays

Authors: Zhengyuan Zhou, Renyuan Xu, Jose Blanchet

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Our results contribute to the broad landscape of contextual bandits literature by establishing that UCB algorithms, which are widely deployed in modern recommendation engines, can be made robust to delays. In this paper, we consider online learning in generalized linear contextual bandits where rewards are not immediately observed. Instead, rewards are available to the decision maker only after some delay, which is unknown and stochastic, even though a decision must be made at each time step for an incoming set of contexts. We study the performance of upper conﬁdence bound (UCB) based algorithms adapted to this delayed setting. In particular, we design a delay-adaptive algorithm, which we call Delayed UCB, for generalized linear contextual bandits using UCB-style exploration and establish regret bounds under various delay assumptions. In the important special case of linear contextual bandits, we further modify this algorithm and establish a tighter regret bound under the same delay assumptions. To the best of our knowledge, these regret bounds provide the ﬁrst theoretical characterizations in (generalized) linear contextual bandits with large delays and contribute to the broad landscape of contextual bandits literature by delineating the impact of delays on performance.
Researcher Affiliation	Collaboration	Zhengyuan Zhou1,2 , Renyuan Xu3 and Jose Blanchet4 1 Department of Electrical Engineering, Stanford University 2 Bytedance Inc. 3 Department of Industrial Engineering and Operations Research, UC Berkeley 4 Department of Management Science and Engineering, Stanford University
Pseudocode	Yes	Algorithm 1 DUCB-GLCB; Algorithm 2 Delayed Base Lin UCB at Step t; Algorithm 3 Delayed Sup Lin UCB
Open Source Code	No	The paper does not include an unambiguous sentence stating that the authors are releasing the code, nor does it provide a direct link to a source-code repository for the methodology described.
Open Datasets	No	This paper is theoretical and focuses on establishing regret bounds for algorithms. It does not conduct empirical experiments that would involve training on a dataset, nor does it mention any specific dataset used for such a purpose or its public availability.
Dataset Splits	No	The paper is theoretical and does not conduct empirical experiments. Therefore, it does not provide details on training/test/validation dataset splits.
Hardware Specification	No	The paper is theoretical and focuses on algorithm design and regret analysis. It does not report on empirical experiments, and therefore, no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical, presenting algorithms and mathematical proofs. It does not describe any implementation details that would require listing software dependencies with specific version numbers.
Experiment Setup	No	The paper is theoretical and focuses on algorithm design and analysis. It does not provide details about an experimental setup, such as hyperparameter values, training schedules, or system-level configurations, as no empirical experiments are reported.