The Blinded Bandit: Learning with Adaptive Feedback
Authors: Ofer Dekel, Elad Hazan, Tomer Koren
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We develop efficient online learning algorithms for this problem and prove that they guarantee the same asymptotic regret as the optimal algorithms for the standard multi-armed bandit problem. In this paper, we present a new algorithm for the blinded bandit setting and prove that it guarantees a regret of O(T) on any oblivious sequence of loss values. |
| Researcher Affiliation | Collaboration | Ofer Dekel Microsoft Research oferd@microsoft.com Elad Hazan Technion ehazan@ie.technion.ac.il Tomer Koren Technion tomerk@technion.ac.il |
| Pseudocode | Yes | Algorithm 1: BLINDED EXP3 and Algorithm 2: BLINDED GEOMETRICHEDGE |
| Open Source Code | No | The paper does not mention providing access to source code for the methodology described. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments on datasets, thus no dataset access information is provided. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical evaluation with dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not conduct experiments, therefore no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not conduct experiments, therefore no specific software dependencies with version numbers are mentioned. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with hyperparameters or training configurations. |