Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Best-of-Both-Worlds Linear Contextual Bandits
Authors: Masahiro Kato, Shinji Ito
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | The goal of this study is to develop a strategy that is effective in both stochastic and adversarial environments, with theoretical guarantees. We first formulate the problem by introducing a novel setting of bandits with adversarial corruption, referred to as the contextual adversarial regime with a self-bounding constraint. We assume linear models for the relationship between the loss and the context. Then, we propose a strategy that extends the Real Lin Exp3 by Neu & Olkhovskaya (2020) and the Follow-The-Regularized-Leader (FTRL). The regret of our proposed algorithm is shown to be upper-bounded by O min log3(T ) T log2(T) , where T N is the number of rounds, > 0 is the constant minimum gap between the best and suboptimal arms for any context, and C [0, T] is an adversarial corruption parameter. This regret upper bound implies O log3(T ) in a stochastic environment and by O T log2(T) in an adversarial environment. We refer to our strategy as the Best-of-Both-Worlds (Bo BW) Real FTRL, due to its theoretical guarantees in both stochastic and adversarial regimes. |
| Researcher Affiliation | Academia | Masahiro Kato EMAIL The University of Tokyo; Shinji Ito EMAIL The University of Tokyo RIKEN AIP NEC Corporation (affiliation upon submission) |
| Pseudocode | Yes | Algorithm 1 Bo BW-Real FTRL. Algorithm 2 Matrix Geometric Resampling (Neu & Olkhovskaya, 2020). |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available. It references other works and their algorithms but does not provide its own. |
| Open Datasets | No | The paper discusses theoretical concepts within linear contextual bandits, such as a 'contextual distribution D'. It does not refer to any specific real-world or synthetic datasets used for empirical evaluation, nor does it provide access information for any dataset. |
| Dataset Splits | No | The paper focuses on theoretical analysis and algorithm design for linear contextual bandits. It does not describe any empirical experiments using datasets, and therefore, no information regarding training, validation, or test dataset splits is provided. |
| Hardware Specification | No | The paper does not provide any specific details about hardware (such as CPU, GPU models, or memory) used for running experiments. The work is theoretical in nature, presenting algorithms and regret bounds. |
| Software Dependencies | No | The paper is theoretical and does not describe any specific software dependencies or their version numbers that would be required to reproduce experiments or implementations. It focuses on algorithmic design and mathematical proofs. |
| Experiment Setup | No | The paper is theoretical, focusing on algorithm design and regret analysis. It does not contain details about an experimental setup, such as specific hyperparameter values, training configurations, or system-level settings, as no empirical experiments are described. |