Efficient learning by implicit exploration in bandit problems with side observations
Authors: Tomáš Kocák, Gergely Neu, Michal Valko, Remi Munos
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | For this setting, we propose the first algorithm that enjoys near-optimal regret guarantees without having to know the observation system before selecting its actions. Along similar lines, we also define a new partial information setting that models online combinatorial optimization problems where the feedback received by the learner is between semi-bandit and full feedback. As the predictions of our first algorithm cannot be always computed efficiently in this setting, we propose another algorithm with similar properties and with the benefit of always being computationally efficient, at the price of a slightly more complicated tuning mechanism. Both algorithms rely on a novel exploration strategy called implicit exploration, which is shown to be more efficient both computationally and information-theoretically than previously studied exploration strategies for the problem. |
| Researcher Affiliation | Collaboration | Tom aˇs Koc ak Gergely Neu Michal Valko R emi Munos Seque L team, INRIA Lille Nord Europe, France {tomas.kocak,gergely.neu,michal.valko,remi.munos}@inria.fr Current affiliation: Google DeepMind |
| Pseudocode | Yes | Algorithm 1 EXP3-IX ... Algorithm 2 FPL-IX |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper is theoretical and does not mention using any specific datasets for training or evaluation. It focuses on algorithm design and theoretical regret guarantees. |
| Dataset Splits | No | The paper is theoretical and does not discuss empirical experiments with dataset splits. Therefore, no training/validation/test splits are provided. |
| Hardware Specification | No | The paper describes theoretical algorithms and their performance guarantees. It does not conduct empirical experiments, and thus no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical, presenting algorithms and their proofs. It does not mention any specific software dependencies or versions required for implementation or reproduction of empirical results. |
| Experiment Setup | No | The paper describes theoretical algorithms and their performance. It does not include an experimental setup section with hyperparameters or training settings, as it does not report on empirical experiments. |