Learning in Generalized Linear Contextual Bandits with Stochastic Delays
Authors: Zhengyuan Zhou, Renyuan Xu, Jose Blanchet
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our results contribute to the broad landscape of contextual bandits literature by establishing that UCB algorithms, which are widely deployed in modern recommendation engines, can be made robust to delays. In this paper, we consider online learning in generalized linear contextual bandits where rewards are not immediately observed. Instead, rewards are available to the decision maker only after some delay, which is unknown and stochastic, even though a decision must be made at each time step for an incoming set of contexts. We study the performance of upper confidence bound (UCB) based algorithms adapted to this delayed setting. In particular, we design a delay-adaptive algorithm, which we call Delayed UCB, for generalized linear contextual bandits using UCB-style exploration and establish regret bounds under various delay assumptions. In the important special case of linear contextual bandits, we further modify this algorithm and establish a tighter regret bound under the same delay assumptions. To the best of our knowledge, these regret bounds provide the first theoretical characterizations in (generalized) linear contextual bandits with large delays and contribute to the broad landscape of contextual bandits literature by delineating the impact of delays on performance. |
| Researcher Affiliation | Collaboration | Zhengyuan Zhou1,2 , Renyuan Xu3 and Jose Blanchet4 1 Department of Electrical Engineering, Stanford University 2 Bytedance Inc. 3 Department of Industrial Engineering and Operations Research, UC Berkeley 4 Department of Management Science and Engineering, Stanford University |
| Pseudocode | Yes | Algorithm 1 DUCB-GLCB; Algorithm 2 Delayed Base Lin UCB at Step t; Algorithm 3 Delayed Sup Lin UCB |
| Open Source Code | No | The paper does not include an unambiguous sentence stating that the authors are releasing the code, nor does it provide a direct link to a source-code repository for the methodology described. |
| Open Datasets | No | This paper is theoretical and focuses on establishing regret bounds for algorithms. It does not conduct empirical experiments that would involve training on a dataset, nor does it mention any specific dataset used for such a purpose or its public availability. |
| Dataset Splits | No | The paper is theoretical and does not conduct empirical experiments. Therefore, it does not provide details on training/test/validation dataset splits. |
| Hardware Specification | No | The paper is theoretical and focuses on algorithm design and regret analysis. It does not report on empirical experiments, and therefore, no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical, presenting algorithms and mathematical proofs. It does not describe any implementation details that would require listing software dependencies with specific version numbers. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm design and analysis. It does not provide details about an experimental setup, such as hyperparameter values, training schedules, or system-level configurations, as no empirical experiments are reported. |