Bypassing the Simulator: Near-Optimal Adversarial Linear Contextual Bandits
Authors: Haolin Liu, Chen-Yu Wei, Julian Zimmert
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We greatly improve these results by achieving a regret of e O( T) without a simulator, while maintaining computational efficiency when the action set in each round is small. In the special case of sleeping bandits with adversarial loss and stochastic arm availability, our result answers affirmatively the open question by Saha et al. [2020] on whether there exists a polynomial-time algorithm with poly(d) T regret. Our approach naturally handles the case where the loss is linear up to an additive misspecification error, and our regret shows near-optimal dependence on the magnitude of the error. |
| Researcher Affiliation | Collaboration | Haolin Liu University of Virginia srs8rh@virginia.edu Chen-Yu Wei University of Virginia chenyu.wei@virginia.edu Julian Zimmert Google Research zimmert@google.com |
| Pseudocode | Yes | Algorithm 1 Logdet-FTRL for linear contextual bandits. Algorithm 2 Linear EXP4. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing open-source code or a link to a code repository for the described methodology. |
| Open Datasets | No | The paper is theoretical and focuses on algorithm design and regret bounds; it does not report on empirical studies using specific datasets. Therefore, no information about public datasets is provided. |
| Dataset Splits | No | The paper is theoretical and does not conduct empirical experiments with datasets. Therefore, there is no mention of training/validation/test splits. |
| Hardware Specification | No | The paper is theoretical and does not describe empirical experiments. Therefore, no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe empirical experiments with specific software implementations. Therefore, no software dependencies with version numbers are listed. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm design and analysis rather than empirical experimentation. Therefore, no experimental setup details such as hyperparameters or system-level training settings are provided. |