First- and Second-Order Bounds for Adversarial Linear Contextual Bandits

Authors: Julia Olkhovskaya, Jack Mayo, Tim van Erven, Gergely Neu, Chen-Yu Wei

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Our main contribution is to replace the worst-case rate by adaptive bounds. Specifically, we obtain a bound of O( VT ) in terms of a quadratic measure of variance VT for the losses of the algorithm, and a bound of O(p L T ), where L T is the cumulative loss incurred by the optimal policy. Such bounds in terms of L T or VT are generally referred to as first-order and second-order bounds, respectively, and have been extensively studied in the bandit literature.
Researcher Affiliation Academia Julia Olkhovskaya1 Jack Mayo2 Tim van Erven2 Gergely Neu3 Chen-Yu Wei4 1Department of Intelligent Systems, Delft University of Technology, Delft, The Netherlands 2Korteweg-de Vries Institute for Mathematics, University of Amsterdam, Amsterdam, The Netherlands 3AI group, DTIC, Universitat Pompeu Fabra, Barcelona, Spain 4MIT Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, MA, USA
Pseudocode Yes Algorithm 1 CONTEXTEW Parameters: γ > 0, η1 . . . ηT > 0, m1, . . . , m T For t = 1, . . . , T: 1. Observe Xt. 2. Repeat: Pick Qt from the distribution pt defined in (8), until a=1 Qt,a Xt 2 Σ 1 t,a d Kγ2, (6) where Σt,a is defined in (9).
Open Source Code No The paper does not contain any statement about making its own code open source, nor does it provide a link to a repository.
Open Datasets No The paper does not refer to any specific dataset or provide access information for one. It mentions 'the d-dimensional contexts are drawn from a fixed known distribution' but this is a theoretical assumption, not a specific dataset.
Dataset Splits No The paper is a theoretical work focusing on deriving bounds and analyzing algorithms. It does not conduct experiments on a dataset, and therefore does not mention any training, validation, or test splits.
Hardware Specification No The paper is theoretical and does not report on running experiments. While it discusses computational efficiency, it does not provide any specific hardware details used for running experiments.
Software Dependencies No The paper is theoretical and focuses on mathematical derivations and algorithm analysis rather than practical implementation. It does not list specific software dependencies with version numbers needed to replicate any experiments.
Experiment Setup No The paper introduces an algorithm (CONTEXTEW) with parameters like γ, ηt, and mt, but these are part of the algorithm's theoretical definition rather than specific values used in an empirical 'experimental setup' with hardware, software, or training configurations.