First- and Second-Order Bounds for Adversarial Linear Contextual Bandits
Authors: Julia Olkhovskaya, Jack Mayo, Tim van Erven, Gergely Neu, Chen-Yu Wei
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our main contribution is to replace the worst-case rate by adaptive bounds. Specifically, we obtain a bound of O( VT ) in terms of a quadratic measure of variance VT for the losses of the algorithm, and a bound of O(p L T ), where L T is the cumulative loss incurred by the optimal policy. Such bounds in terms of L T or VT are generally referred to as first-order and second-order bounds, respectively, and have been extensively studied in the bandit literature. |
| Researcher Affiliation | Academia | Julia Olkhovskaya1 Jack Mayo2 Tim van Erven2 Gergely Neu3 Chen-Yu Wei4 1Department of Intelligent Systems, Delft University of Technology, Delft, The Netherlands 2Korteweg-de Vries Institute for Mathematics, University of Amsterdam, Amsterdam, The Netherlands 3AI group, DTIC, Universitat Pompeu Fabra, Barcelona, Spain 4MIT Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, MA, USA |
| Pseudocode | Yes | Algorithm 1 CONTEXTEW Parameters: γ > 0, η1 . . . ηT > 0, m1, . . . , m T For t = 1, . . . , T: 1. Observe Xt. 2. Repeat: Pick Qt from the distribution pt defined in (8), until a=1 Qt,a Xt 2 Σ 1 t,a d Kγ2, (6) where Σt,a is defined in (9). |
| Open Source Code | No | The paper does not contain any statement about making its own code open source, nor does it provide a link to a repository. |
| Open Datasets | No | The paper does not refer to any specific dataset or provide access information for one. It mentions 'the d-dimensional contexts are drawn from a fixed known distribution' but this is a theoretical assumption, not a specific dataset. |
| Dataset Splits | No | The paper is a theoretical work focusing on deriving bounds and analyzing algorithms. It does not conduct experiments on a dataset, and therefore does not mention any training, validation, or test splits. |
| Hardware Specification | No | The paper is theoretical and does not report on running experiments. While it discusses computational efficiency, it does not provide any specific hardware details used for running experiments. |
| Software Dependencies | No | The paper is theoretical and focuses on mathematical derivations and algorithm analysis rather than practical implementation. It does not list specific software dependencies with version numbers needed to replicate any experiments. |
| Experiment Setup | No | The paper introduces an algorithm (CONTEXTEW) with parameters like γ, ηt, and mt, but these are part of the algorithm's theoretical definition rather than specific values used in an empirical 'experimental setup' with hardware, software, or training configurations. |