Efficient Algorithms for Adversarial Contextual Learning

Authors: Vasilis Syrgkanis, Akshay Krishnamurthy, Robert Schapire

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We provide the first oracle efficient sublinear regret algorithms for adversarial versions of the contextual bandit problem. Our algorithms fall into the Follow-The-Perturbed-Leader family (Kalai & Vempala, 2005) and achieve regret O(T 3/4p K log(N)) in the transductive setting and O(T 2/3d3/4K p log(N)) in the separator setting, where T is the number of rounds, K is the number of actions, N is the number of baseline policies, and d is the size of the separator. The proof of the theorem is based on a well-known bethe-leader argument.
Researcher Affiliation Collaboration Vasilis Syrgkanis VASY@MICROSOFT.COM Microsoft Research, 641 Avenue of the Americas, New York, NY 10011 USA Akshay Krishnamurthy AKSHAYKR@CS.CMU.EDU Microsoft Research, 641 Avenue of the Americas, New York, NY 10011 USA Robert E. Schapire SCHAPIRE@MICROSOFT.COM Microsoft Research, 641 Avenue of the Americas, New York, NY 10011 USA
Pseudocode Yes Algorithm 1 Follow the perturbed leader with fake sample perturbations FTPL. Algorithm 2 Contextual Follow the Perturbed Leader Algorithm CONTEXT-FTPL(X, ǫ). Algorithm 3 Contextual Semi-Bandit Algorithm CONTEXT-SEMI-BANDIT-FTPL(X, ǫ, L).
Open Source Code No The paper does not provide any statement about making its source code publicly available, nor does it include links to a code repository.
Open Datasets No This is a theoretical paper that does not conduct empirical experiments, therefore it does not use or make publicly available any dataset for training.
Dataset Splits No This is a theoretical paper that does not conduct empirical experiments and therefore does not specify training, validation, or test dataset splits.
Hardware Specification No This is a theoretical paper that does not describe any specific hardware used for running experiments.
Software Dependencies No This is a theoretical paper and does not specify any software dependencies with version numbers.
Experiment Setup No This is a theoretical paper and does not describe an experimental setup with specific hyperparameters or training settings.