Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
An Improved Relaxation for Oracle-Efficient Adversarial Contextual Bandits
Authors: Kiarash Banihashem, MohammadTaghi Hajiaghayi, Suho Shin, Max Springer
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We present an oracle-efficient relaxation for the adversarial contextual bandits problem, where the contexts are sequentially drawn i.i.d from a known distribution and the cost sequence is chosen by an online adversary. Our algorithm has a regret bound of O(T 2 3 (K log(|Π|)) 1 3 ) and makes at most O(K) calls per round to an offline optimization oracle, where K denotes the number of actions, T denotes the number of rounds and Π denotes the set of policies. This is the first result to improve the prior best bound of O((TK) 2 3 (log(|Π|)) 1 3 ) as obtained by Syrgkanis et al. at Neur IPS 2016, and the first to match the original bound of Langford and Zhang at Neur IPS 2007 which was obtained for the stochastic case. |
| Researcher Affiliation | Academia | Kiarash Banihashem University of Maryland, College Park EMAIL Mohammad Taghi Hajiaghayi University of Maryland, College Park EMAIL Suho Shin University of Maryland, College Park EMAIL Max Springer University of Maryland, College Park EMAIL |
| Pseudocode | Yes | Algorithm 1: Contextual Bandits Algorithm Algorithm 2: Compute q t |
| Open Source Code | No | The paper does not mention providing open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments on a public dataset. It refers to contexts being 'sequentially drawn i.i.d from a known distribution D' and the learner having 'sampling access to the distribution D', which are theoretical assumptions rather than dataset specifications. |
| Dataset Splits | No | The paper is theoretical and does not involve data splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not list any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with hyperparameters or system-level training settings. |