Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Blinded Bandit: Learning with Adaptive Feedback

Authors: Ofer Dekel, Elad Hazan, Tomer Koren

NeurIPS 2014 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We develop ef๏ฌcient online learning algorithms for this problem and prove that they guarantee the same asymptotic regret as the optimal algorithms for the standard multi-armed bandit problem. In this paper, we present a new algorithm for the blinded bandit setting and prove that it guarantees a regret of O(T) on any oblivious sequence of loss values.
Researcher Affiliation Collaboration Ofer Dekel Microsoft Research EMAIL Elad Hazan Technion EMAIL Tomer Koren Technion EMAIL
Pseudocode Yes Algorithm 1: BLINDED EXP3 and Algorithm 2: BLINDED GEOMETRICHEDGE
Open Source Code No The paper does not mention providing access to source code for the methodology described.
Open Datasets No The paper is theoretical and does not conduct experiments on datasets, thus no dataset access information is provided.
Dataset Splits No The paper is theoretical and does not involve empirical evaluation with dataset splits.
Hardware Specification No The paper is theoretical and does not conduct experiments, therefore no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and does not conduct experiments, therefore no specific software dependencies with version numbers are mentioned.
Experiment Setup No The paper is theoretical and does not describe an experimental setup with hyperparameters or training configurations.