BISTRO: An Efficient Relaxation-Based Method for Contextual Bandits
Authors: Alexander Rakhlin, Karthik Sridharan
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our algorithm BISTRO requires d calls to the empirical risk minimization (ERM) oracle per round, where d is the number of actions. The method uses unlabeled data to make the problem computationally simple. When the ERM problem itself is computationally hard, we extend the approach by employing multiplicative approximation algorithms for the ERM. The integrality gap of the relaxation only enters in the regret bound rather than the benchmark. Finally, we show that the adversarial version of the contextual bandit problem is learnable (and efficient) whenever the fullinformation supervised online learning problem has a non-trivial regret guarantee (and efficient). |
| Researcher Affiliation | Academia | Alexander Rakhlin RAKHLIN@WHARTON.UPENN.EDU University of Pennsylvania Karthik Sridharan SRIDHARAN@CS.CORNELL.EDU Cornell University |
| Pseudocode | Yes | Algorithm 1 BISTRO: Band It S wi Th Relaxati Ons input Parameter γ (0,1 d) 1: for t = 1,...,n do 2: Observe xt. Draw xt+1 n Px and t+1 n . 3: Construct Y (t) and define q t to be a minimizer of Tej min M M[x1 n] over q d and set qt = (1 γd)q t + γ1. (9) 4: Predict yt qt and observe ct( yt). 5: Create an estimate ct: ct(j) = I{ yt = j} ct( yt) qt(j). |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper mentions working with "i.i.d. draws of contexts (e.g. unlabeled data)" and drawing from an "unknown distribution Px", but it does not specify or provide access information for any publicly available or open dataset. |
| Dataset Splits | No | As a theoretical paper, it does not describe experimental dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not mention specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithmic design and theoretical analysis; it does not provide specific experimental setup details such as hyperparameter values or training configurations. |