reproducibilityindex.ai

An Improved Relaxation for Oracle-Efficient Adversarial Contextual Bandits

Authors: Kiarash Banihashem, MohammadTaghi Hajiaghayi, Suho Shin, Max Springer

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We present an oracle-efﬁcient relaxation for the adversarial contextual bandits problem, where the contexts are sequentially drawn i.i.d from a known distribution and the cost sequence is chosen by an online adversary. Our algorithm has a regret bound of O(T 2 3 (K log(\|Π\|)) 1 3 ) and makes at most O(K) calls per round to an ofﬂine optimization oracle, where K denotes the number of actions, T denotes the number of rounds and Π denotes the set of policies. This is the ﬁrst result to improve the prior best bound of O((TK) 2 3 (log(\|Π\|)) 1 3 ) as obtained by Syrgkanis et al. at Neur IPS 2016, and the ﬁrst to match the original bound of Langford and Zhang at Neur IPS 2007 which was obtained for the stochastic case.
Researcher Affiliation	Academia	Kiarash Banihashem University of Maryland, College Park kiarash@umd.edu Mohammad Taghi Hajiaghayi University of Maryland, College Park hajiagha@umd.edu Suho Shin University of Maryland, College Park suhoshin@umd.edu Max Springer University of Maryland, College Park mss423@umd.edu
Pseudocode	Yes	Algorithm 1: Contextual Bandits Algorithm Algorithm 2: Compute q t
Open Source Code	No	The paper does not mention providing open-source code for the described methodology.
Open Datasets	No	The paper is theoretical and does not conduct experiments on a public dataset. It refers to contexts being 'sequentially drawn i.i.d from a known distribution D' and the learner having 'sampling access to the distribution D', which are theoretical assumptions rather than dataset specifications.
Dataset Splits	No	The paper is theoretical and does not involve data splits for training, validation, or testing.
Hardware Specification	No	The paper is theoretical and does not describe any hardware used for experiments.
Software Dependencies	No	The paper is theoretical and does not list any specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and does not describe an experimental setup with hyperparameters or system-level training settings.