Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update
Authors: Yu-Jie Zhang, Sheng-An Xu, Peng Zhao, Masashi Sugiyama
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section evaluates the proposed method on two representative GLB problems: logistic bandits (µ(z) = 1/(1 + e z)) with bounded rewards, and Poisson bandits (µ(z) = ez), which pose a distinct challenge as an unbounded GLB setting. We also conduct experiments on real data from the Covertype dataset [Blackard, 1998], with more detailed results provided in Appendix E. |
| Researcher Affiliation | Academia | 1 RIKEN AIP, Tokyo, Japan 2 National Key Laboratory for Novel Software Technology, Nanjing University, China 3 School of Artificial Intelligence, Nanjing University, China 4 The University of Tokyo, Chiba, Japan Correspondence: Peng Zhao <EMAIL> |
| Pseudocode | Yes | Algorithm 1 GLB-OMD Input: Self-concordant constant R, Lipchitz constant Lµ, parameter radius S, confidence level δ. 1: Initialize θ1 Θ := {θ Rd | θ 2 S} and H1 = λId. 2: for t = 1 to T do 3: Construct the confidence set Ct(δ) according to (5). 4: Select the arm Xt according to rule (6) and receive the reward rt. 5: Update the online estimator θt+1 by (3) and set Ht+1 = Ht + 2ℓt(θt+1). 6: end for |
| Open Source Code | No | Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: The code and data are not released. |
| Open Datasets | Yes | We also conduct experiments on real data from the Covertype dataset [Blackard, 1998] |
| Dataset Splits | No | The paper describes partitioning the data into K=60 clusters for defining arms and binarizing rewards for the Covertype dataset. It mentions setting the horizon T=1000. However, it does not explicitly provide information on train/test/validation splits (percentages, sample counts, or methodology) for reproducing data partitioning. |
| Hardware Specification | Yes | All the experiments were conducted on Intel Xeon Gold 6242R processors (40 cores, 4.1GHz base frequency). |
| Software Dependencies | No | The algorithms were implemented in Python, utilizing the scipy library for numerical computations, such as solving non-linear optimization problems and calculating vector norms, and employing np.linalg.pinv to compute the pseudo-inverse of matrices. The running time was measured using the time library. |
| Experiment Setup | Yes | Throughout our experiments, all algorithm parameters were configured according to their theoretical derivations without additional fine-tuning, with the sole exception of the regularization parameter λ. To ensure a fair comparison, we adopted a unified approach for setting λ across different algorithm categories: we set λ = d for all efficient online algorithms (including GLB-OMD, RS-GLin CB, ECOLog, and GLOC), while using λ = d log(1 + t) for offline algorithms that require regularization. For this task, we set the horizon to T = 1000 and the confidence parameter to δ = 0.01. After analyzing the data, we set S = 6 and κ = 200. |