Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
The Blinded Bandit: Learning with Adaptive Feedback
Authors: Ofer Dekel, Elad Hazan, Tomer Koren
NeurIPS 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We develop ef๏ฌcient online learning algorithms for this problem and prove that they guarantee the same asymptotic regret as the optimal algorithms for the standard multi-armed bandit problem. In this paper, we present a new algorithm for the blinded bandit setting and prove that it guarantees a regret of O(T) on any oblivious sequence of loss values. |
| Researcher Affiliation | Collaboration | Ofer Dekel Microsoft Research EMAIL Elad Hazan Technion EMAIL Tomer Koren Technion EMAIL |
| Pseudocode | Yes | Algorithm 1: BLINDED EXP3 and Algorithm 2: BLINDED GEOMETRICHEDGE |
| Open Source Code | No | The paper does not mention providing access to source code for the methodology described. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments on datasets, thus no dataset access information is provided. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical evaluation with dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not conduct experiments, therefore no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not conduct experiments, therefore no specific software dependencies with version numbers are mentioned. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with hyperparameters or training configurations. |