Provably Optimal Algorithms for Generalized Linear Contextual Bandits
Authors: Lihong Li, Yu Lu, Dengyong Zhou
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we propose an upper confidence bound based algorithm for generalized linear contextual bandits, which achieves an O(d T) regret over T rounds with d dimensional feature vectors. This regret matches the minimax lower bound, up to logarithmic terms, and improves on the best previous result by a d factor, assuming the number of arms is fixed. A key component in our analysis is to establish a new, sharp finite-sample confidence bound for maximumlikelihood estimates in generalized linear models, which may be of independent interest. |
| Researcher Affiliation | Collaboration | Lihong Li 1 Yu Lu 2 Dengyong Zhou 1 1Microsoft Research, Redmond, WA 98052 2Department of Statistics, Yale University, New Haven, CT, USA. |
| Pseudocode | Yes | Algorithm 1 UCB-GLM Algorithm 2 CB-GLM Algorithm 3 Sup CB-GLM |
| Open Source Code | No | The paper does not provide any statements about open-source code availability, nor does it include links to code repositories. |
| Open Datasets | No | The paper defines a theoretical problem setting where 'a context consisting of a set of K feature vectors... which is drawn IID from an unknown distribution ν', but it does not specify or provide access information for a publicly available dataset. |
| Dataset Splits | No | The paper does not describe any experimental setup involving data splits (e.g., training, validation, test sets), as it is a theoretical work. |
| Hardware Specification | No | The paper is theoretical and focuses on algorithm design and analysis; therefore, it does not mention any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and focuses on algorithm design and analysis; therefore, it does not mention any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper focuses on theoretical analysis and algorithm development; thus, it does not provide details about experimental setup, hyperparameters, or training configurations. |