Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
BISTRO: An Efficient Relaxation-Based Method for Contextual Bandits
Authors: Alexander Rakhlin, Karthik Sridharan
ICML 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our algorithm BISTRO requires d calls to the empirical risk minimization (ERM) oracle per round, where d is the number of actions. The method uses unlabeled data to make the problem computationally simple. When the ERM problem itself is computationally hard, we extend the approach by employing multiplicative approximation algorithms for the ERM. The integrality gap of the relaxation only enters in the regret bound rather than the benchmark. Finally, we show that the adversarial version of the contextual bandit problem is learnable (and efficient) whenever the fullinformation supervised online learning problem has a non-trivial regret guarantee (and efficient). |
| Researcher Affiliation | Academia | Alexander Rakhlin EMAIL University of Pennsylvania Karthik Sridharan EMAIL Cornell University |
| Pseudocode | Yes | Algorithm 1 BISTRO: Band It S wi Th Relaxati Ons input Parameter γ (0,1 d) 1: for t = 1,...,n do 2: Observe xt. Draw xt+1 n Px and t+1 n . 3: Construct Y (t) and define q t to be a minimizer of Tej min M M[x1 n] over q d and set qt = (1 γd)q t + γ1. (9) 4: Predict yt qt and observe ct( yt). 5: Create an estimate ct: ct(j) = I{ yt = j} ct( yt) qt(j). |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper mentions working with "i.i.d. draws of contexts (e.g. unlabeled data)" and drawing from an "unknown distribution Px", but it does not specify or provide access information for any publicly available or open dataset. |
| Dataset Splits | No | As a theoretical paper, it does not describe experimental dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not mention specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithmic design and theoretical analysis; it does not provide specific experimental setup details such as hyperparameter values or training configurations. |