An Improved Relaxation for Oracle-Efficient Adversarial Contextual Bandits
Authors: Kiarash Banihashem, MohammadTaghi Hajiaghayi, Suho Shin, Max Springer
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We present an oracle-efficient relaxation for the adversarial contextual bandits problem, where the contexts are sequentially drawn i.i.d from a known distribution and the cost sequence is chosen by an online adversary. Our algorithm has a regret bound of O(T 2 3 (K log(|Π|)) 1 3 ) and makes at most O(K) calls per round to an offline optimization oracle, where K denotes the number of actions, T denotes the number of rounds and Π denotes the set of policies. This is the first result to improve the prior best bound of O((TK) 2 3 (log(|Π|)) 1 3 ) as obtained by Syrgkanis et al. at Neur IPS 2016, and the first to match the original bound of Langford and Zhang at Neur IPS 2007 which was obtained for the stochastic case. |
| Researcher Affiliation | Academia | Kiarash Banihashem University of Maryland, College Park kiarash@umd.edu Mohammad Taghi Hajiaghayi University of Maryland, College Park hajiagha@umd.edu Suho Shin University of Maryland, College Park suhoshin@umd.edu Max Springer University of Maryland, College Park mss423@umd.edu |
| Pseudocode | Yes | Algorithm 1: Contextual Bandits Algorithm Algorithm 2: Compute q t |
| Open Source Code | No | The paper does not mention providing open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments on a public dataset. It refers to contexts being 'sequentially drawn i.i.d from a known distribution D' and the learner having 'sampling access to the distribution D', which are theoretical assumptions rather than dataset specifications. |
| Dataset Splits | No | The paper is theoretical and does not involve data splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not list any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with hyperparameters or system-level training settings. |