Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Improved Optimistic Algorithms for Logistic Bandits
Authors: Louis Faury, Marc Abeille, Clement Calauzenes, Olivier Fercoq
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We propose a new optimistic algorithm based on a finer examination of the non-linearities of the reward function. We show that it enjoys a O(T) regret with no dependency in κ, but for a second order term. Our analysis is based on a new tail-inequality for selfnormalized martingales, of independent interest. Our main contributions are : 1) we answer positively to the conjecture of Filippi et al. (2010) showing that a slightly modified version of GLM-UCB enjoys a O(dκT) frequentist regret (Theorem 2). 2) Further, we propose a new algorithm with yet better dependencies in κ, showing that it can be pushed in a second-order term. This results in a O(dT + κ) regret bound (Theorem 3). 3) A key ingredient of our analysis is a new Bernstein-like inequality for self-normalized martingales, of independent interest (Theorem 1). |
| Researcher Affiliation | Collaboration | 1Criteo AI Lab, Paris, France 2LTCI, Tel ecom Paris, Institut Polytechnique de Paris, Palaiseau, France. |
| Pseudocode | Yes | Algorithm 1 Logistic-UCB-1 Input: regularization parameter λ for t 1 do Compute θ(1) t (Equation (8)) Observe the contexts-action feature set Xt. Play xt = arg maxx Xt µ(x Tθ(1) t ) + ϵt,1(x) Observe rewards rt+1. end for and Algorithm 2 Logistic-UCB-2 Input: regularization parameter λ Initialize the set of admissible log-odds W0 = Θ for t 1 do Compute θ(2) t (Equation (9)) Observe the contexts-action feature set Xt. Play xt = arg maxx Xt µ(x Tθ(2) t ) + ϵt,2(x, θ(2) t ). Observe rewards rt+1. Compute the log-odds ℓt = supθ Ct(δ) Θ x T t θ . Add the new constraint to the feasible set: Wt+1 = Wt {θ : ℓt θTxt ℓt} |
| Open Source Code | No | The paper does not contain any statement about releasing source code or provide a link to a code repository for the described methodology. |
| Open Datasets | No | The paper focuses on theoretical analysis and algorithm design and does not involve training models on any specific datasets, public or otherwise. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical validation on datasets, thus no dataset split information for validation is provided. |
| Hardware Specification | No | The paper focuses on theoretical analysis and algorithm design, and as such, it does not provide any details about hardware specifications used for experiments. |
| Software Dependencies | No | The paper is purely theoretical, focusing on mathematical proofs and algorithm design, and therefore does not list specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and focuses on developing algorithms and deriving regret bounds; it does not include details on experimental setup such as hyperparameters or training configurations. |