BISTRO: An Efficient Relaxation-Based Method for Contextual Bandits

Authors: Alexander Rakhlin, Karthik Sridharan

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Our algorithm BISTRO requires d calls to the empirical risk minimization (ERM) oracle per round, where d is the number of actions. The method uses unlabeled data to make the problem computationally simple. When the ERM problem itself is computationally hard, we extend the approach by employing multiplicative approximation algorithms for the ERM. The integrality gap of the relaxation only enters in the regret bound rather than the benchmark. Finally, we show that the adversarial version of the contextual bandit problem is learnable (and efficient) whenever the fullinformation supervised online learning problem has a non-trivial regret guarantee (and efficient).
Researcher Affiliation Academia Alexander Rakhlin RAKHLIN@WHARTON.UPENN.EDU University of Pennsylvania Karthik Sridharan SRIDHARAN@CS.CORNELL.EDU Cornell University
Pseudocode Yes Algorithm 1 BISTRO: Band It S wi Th Relaxati Ons input Parameter γ (0,1 d) 1: for t = 1,...,n do 2: Observe xt. Draw xt+1 n Px and t+1 n . 3: Construct Y (t) and define q t to be a minimizer of Tej min M M[x1 n] over q d and set qt = (1 γd)q t + γ1. (9) 4: Predict yt qt and observe ct( yt). 5: Create an estimate ct: ct(j) = I{ yt = j} ct( yt) qt(j).
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No The paper mentions working with "i.i.d. draws of contexts (e.g. unlabeled data)" and drawing from an "unknown distribution Px", but it does not specify or provide access information for any publicly available or open dataset.
Dataset Splits No As a theoretical paper, it does not describe experimental dataset splits for training, validation, or testing.
Hardware Specification No The paper is theoretical and does not describe any specific hardware used for running experiments.
Software Dependencies No The paper is theoretical and does not mention specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and focuses on algorithmic design and theoretical analysis; it does not provide specific experimental setup details such as hyperparameter values or training configurations.