Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Tight First- and Second-Order Regret Bounds for Adversarial Linear Bandits
Authors: Shinji Ito, Shuichi Hirahara, Tasuku Soma, Yuichi Yoshida
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We propose novel algorithms with firstand second-order regret bounds for adversarial linear bandits. These regret bounds imply that our algorithms perform well when there is an action achieving a small cumulative loss or the loss has a small variance. In addition, we need only assumptions weaker than those of existing algorithms; our algorithms work on discrete action sets as well as continuous ones without a priori knowledge about losses, and they run efficiently if a linear optimization oracle for the action set is available. These results are obtained by combining optimistic online optimization, continuous multiplicative weight update methods, and a novel technique that we refer to as distribution truncation. We also show that the regret bounds of our algorithms are tight up to polylogarithmic factors.In this paper, we present newly-devised, efficient algorithms with improved firstand second-order regret bounds for the adversarial linear bandit problem. Our algorithms not only yield better regret bounds but also make only fewer assumptions than have been seen in previous studies. |
| Researcher Affiliation | Collaboration | Shinji Ito NEC Corporation EMAIL Shuichi Hirahara National Institute of Informatics EMAIL Tasuku Soma The University of Tokyo EMAIL Yuichi Yoshida National Institute of Informatics EMAIL |
| Pseudocode | Yes | Algorithm 1 An algorithm for adversarial linear bandits with predicted loss 1: for t = 1, 2, . . . , T do 2: repeat 3: Pick xt from the distribution pt, defined by (5). 4: until xt 2 S(pt) 1 dγ2 t 5: Choose at A so that E[at] = xt, play at, and receive a loss ℓt, at as feedback. 6: Compute an unbiased estimator ˆℓt of ℓt as ˆℓt = mt + ℓt mt, at S( pt) 1xt. 7: Update pt as in (5). 8: end for |
| Open Source Code | No | The paper does not provide any specific repository link, explicit code release statement, or mention of code in supplementary materials for the described methodology. |
| Open Datasets | No | The paper is a theoretical work focusing on algorithms and regret bounds for adversarial linear bandits, and it does not use or refer to any specific datasets for training or evaluation. |
| Dataset Splits | No | The paper is a theoretical work and does not involve empirical evaluation with datasets, thus no dataset split information is provided. |
| Hardware Specification | No | The paper is a theoretical work on algorithms and regret bounds, and thus does not describe any specific hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and focuses on algorithm design and mathematical proofs; it does not mention specific software dependencies or versions required to replicate any experiments. |
| Experiment Setup | No | The paper is theoretical and describes algorithms and their regret bounds; it does not include details on an experimental setup, hyperparameters, or training configurations. |