Provably Optimal Algorithms for Generalized Linear Contextual Bandits

Authors: Lihong Li, Yu Lu, Dengyong Zhou

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this work, we propose an upper confidence bound based algorithm for generalized linear contextual bandits, which achieves an O(d T) regret over T rounds with d dimensional feature vectors. This regret matches the minimax lower bound, up to logarithmic terms, and improves on the best previous result by a d factor, assuming the number of arms is fixed. A key component in our analysis is to establish a new, sharp finite-sample confidence bound for maximumlikelihood estimates in generalized linear models, which may be of independent interest.
Researcher Affiliation Collaboration Lihong Li 1 Yu Lu 2 Dengyong Zhou 1 1Microsoft Research, Redmond, WA 98052 2Department of Statistics, Yale University, New Haven, CT, USA.
Pseudocode Yes Algorithm 1 UCB-GLM Algorithm 2 CB-GLM Algorithm 3 Sup CB-GLM
Open Source Code No The paper does not provide any statements about open-source code availability, nor does it include links to code repositories.
Open Datasets No The paper defines a theoretical problem setting where 'a context consisting of a set of K feature vectors... which is drawn IID from an unknown distribution ν', but it does not specify or provide access information for a publicly available dataset.
Dataset Splits No The paper does not describe any experimental setup involving data splits (e.g., training, validation, test sets), as it is a theoretical work.
Hardware Specification No The paper is theoretical and focuses on algorithm design and analysis; therefore, it does not mention any specific hardware used for experiments.
Software Dependencies No The paper is theoretical and focuses on algorithm design and analysis; therefore, it does not mention any specific software dependencies with version numbers.
Experiment Setup No The paper focuses on theoretical analysis and algorithm development; thus, it does not provide details about experimental setup, hyperparameters, or training configurations.