The Blinded Bandit: Learning with Adaptive Feedback

Authors: Ofer Dekel, Elad Hazan, Tomer Koren

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We develop efficient online learning algorithms for this problem and prove that they guarantee the same asymptotic regret as the optimal algorithms for the standard multi-armed bandit problem. In this paper, we present a new algorithm for the blinded bandit setting and prove that it guarantees a regret of O(T) on any oblivious sequence of loss values.
Researcher Affiliation Collaboration Ofer Dekel Microsoft Research oferd@microsoft.com Elad Hazan Technion ehazan@ie.technion.ac.il Tomer Koren Technion tomerk@technion.ac.il
Pseudocode Yes Algorithm 1: BLINDED EXP3 and Algorithm 2: BLINDED GEOMETRICHEDGE
Open Source Code No The paper does not mention providing access to source code for the methodology described.
Open Datasets No The paper is theoretical and does not conduct experiments on datasets, thus no dataset access information is provided.
Dataset Splits No The paper is theoretical and does not involve empirical evaluation with dataset splits.
Hardware Specification No The paper is theoretical and does not conduct experiments, therefore no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and does not conduct experiments, therefore no specific software dependencies with version numbers are mentioned.
Experiment Setup No The paper is theoretical and does not describe an experimental setup with hyperparameters or training configurations.