Improved Optimistic Algorithms for Logistic Bandits

Authors: Louis Faury, Marc Abeille, Clement Calauzenes, Olivier Fercoq

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We propose a new optimistic algorithm based on a finer examination of the non-linearities of the reward function. We show that it enjoys a O(T) regret with no dependency in κ, but for a second order term. Our analysis is based on a new tail-inequality for selfnormalized martingales, of independent interest. Our main contributions are : 1) we answer positively to the conjecture of Filippi et al. (2010) showing that a slightly modified version of GLM-UCB enjoys a O(dκT) frequentist regret (Theorem 2). 2) Further, we propose a new algorithm with yet better dependencies in κ, showing that it can be pushed in a second-order term. This results in a O(dT + κ) regret bound (Theorem 3). 3) A key ingredient of our analysis is a new Bernstein-like inequality for self-normalized martingales, of independent interest (Theorem 1).
Researcher Affiliation Collaboration 1Criteo AI Lab, Paris, France 2LTCI, Tel ecom Paris, Institut Polytechnique de Paris, Palaiseau, France.
Pseudocode Yes Algorithm 1 Logistic-UCB-1 Input: regularization parameter λ for t 1 do Compute θ(1) t (Equation (8)) Observe the contexts-action feature set Xt. Play xt = arg maxx Xt µ(x Tθ(1) t ) + ϵt,1(x) Observe rewards rt+1. end for and Algorithm 2 Logistic-UCB-2 Input: regularization parameter λ Initialize the set of admissible log-odds W0 = Θ for t 1 do Compute θ(2) t (Equation (9)) Observe the contexts-action feature set Xt. Play xt = arg maxx Xt µ(x Tθ(2) t ) + ϵt,2(x, θ(2) t ). Observe rewards rt+1. Compute the log-odds ℓt = supθ Ct(δ) Θ x T t θ . Add the new constraint to the feasible set: Wt+1 = Wt {θ : ℓt θTxt ℓt}
Open Source Code No The paper does not contain any statement about releasing source code or provide a link to a code repository for the described methodology.
Open Datasets No The paper focuses on theoretical analysis and algorithm design and does not involve training models on any specific datasets, public or otherwise.
Dataset Splits No The paper is theoretical and does not involve empirical validation on datasets, thus no dataset split information for validation is provided.
Hardware Specification No The paper focuses on theoretical analysis and algorithm design, and as such, it does not provide any details about hardware specifications used for experiments.
Software Dependencies No The paper is purely theoretical, focusing on mathematical proofs and algorithm design, and therefore does not list specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and focuses on developing algorithms and deriving regret bounds; it does not include details on experimental setup such as hyperparameters or training configurations.