Improved Optimistic Algorithms for Logistic Bandits
Authors: Louis Faury, Marc Abeille, Clement Calauzenes, Olivier Fercoq
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We propose a new optimistic algorithm based on a finer examination of the non-linearities of the reward function. We show that it enjoys a O(T) regret with no dependency in κ, but for a second order term. Our analysis is based on a new tail-inequality for selfnormalized martingales, of independent interest. Our main contributions are : 1) we answer positively to the conjecture of Filippi et al. (2010) showing that a slightly modified version of GLM-UCB enjoys a O(dκT) frequentist regret (Theorem 2). 2) Further, we propose a new algorithm with yet better dependencies in κ, showing that it can be pushed in a second-order term. This results in a O(dT + κ) regret bound (Theorem 3). 3) A key ingredient of our analysis is a new Bernstein-like inequality for self-normalized martingales, of independent interest (Theorem 1). |
| Researcher Affiliation | Collaboration | 1Criteo AI Lab, Paris, France 2LTCI, Tel ecom Paris, Institut Polytechnique de Paris, Palaiseau, France. |
| Pseudocode | Yes | Algorithm 1 Logistic-UCB-1 Input: regularization parameter λ for t 1 do Compute θ(1) t (Equation (8)) Observe the contexts-action feature set Xt. Play xt = arg maxx Xt µ(x Tθ(1) t ) + ϵt,1(x) Observe rewards rt+1. end for and Algorithm 2 Logistic-UCB-2 Input: regularization parameter λ Initialize the set of admissible log-odds W0 = Θ for t 1 do Compute θ(2) t (Equation (9)) Observe the contexts-action feature set Xt. Play xt = arg maxx Xt µ(x Tθ(2) t ) + ϵt,2(x, θ(2) t ). Observe rewards rt+1. Compute the log-odds ℓt = supθ Ct(δ) Θ x T t θ . Add the new constraint to the feasible set: Wt+1 = Wt {θ : ℓt θTxt ℓt} |
| Open Source Code | No | The paper does not contain any statement about releasing source code or provide a link to a code repository for the described methodology. |
| Open Datasets | No | The paper focuses on theoretical analysis and algorithm design and does not involve training models on any specific datasets, public or otherwise. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical validation on datasets, thus no dataset split information for validation is provided. |
| Hardware Specification | No | The paper focuses on theoretical analysis and algorithm design, and as such, it does not provide any details about hardware specifications used for experiments. |
| Software Dependencies | No | The paper is purely theoretical, focusing on mathematical proofs and algorithm design, and therefore does not list specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and focuses on developing algorithms and deriving regret bounds; it does not include details on experimental setup such as hyperparameters or training configurations. |