reproducibilityindex.ai

BISTRO: An Efficient Relaxation-Based Method for Contextual Bandits

Authors: Alexander Rakhlin, Karthik Sridharan

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Our algorithm BISTRO requires d calls to the empirical risk minimization (ERM) oracle per round, where d is the number of actions. The method uses unlabeled data to make the problem computationally simple. When the ERM problem itself is computationally hard, we extend the approach by employing multiplicative approximation algorithms for the ERM. The integrality gap of the relaxation only enters in the regret bound rather than the benchmark. Finally, we show that the adversarial version of the contextual bandit problem is learnable (and efﬁcient) whenever the fullinformation supervised online learning problem has a non-trivial regret guarantee (and efﬁcient).
Researcher Affiliation	Academia	Alexander Rakhlin RAKHLIN@WHARTON.UPENN.EDU University of Pennsylvania Karthik Sridharan SRIDHARAN@CS.CORNELL.EDU Cornell University
Pseudocode	Yes	Algorithm 1 BISTRO: Band It S wi Th Relaxati Ons input Parameter γ (0,1 d) 1: for t = 1,...,n do 2: Observe xt. Draw xt+1 n Px and t+1 n . 3: Construct Y (t) and deﬁne q t to be a minimizer of Tej min M M[x1 n] over q d and set qt = (1 γd)q t + γ1. (9) 4: Predict yt qt and observe ct( yt). 5: Create an estimate ct: ct(j) = I{ yt = j} ct( yt) qt(j).
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	No	The paper mentions working with "i.i.d. draws of contexts (e.g. unlabeled data)" and drawing from an "unknown distribution Px", but it does not specify or provide access information for any publicly available or open dataset.
Dataset Splits	No	As a theoretical paper, it does not describe experimental dataset splits for training, validation, or testing.
Hardware Specification	No	The paper is theoretical and does not describe any specific hardware used for running experiments.
Software Dependencies	No	The paper is theoretical and does not mention specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and focuses on algorithmic design and theoretical analysis; it does not provide specific experimental setup details such as hyperparameter values or training configurations.