Efficient learning by implicit exploration in bandit problems with side observations

Authors: Tomáš Kocák, Gergely Neu, Michal Valko, Remi Munos

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical For this setting, we propose the first algorithm that enjoys near-optimal regret guarantees without having to know the observation system before selecting its actions. Along similar lines, we also define a new partial information setting that models online combinatorial optimization problems where the feedback received by the learner is between semi-bandit and full feedback. As the predictions of our first algorithm cannot be always computed efficiently in this setting, we propose another algorithm with similar properties and with the benefit of always being computationally efficient, at the price of a slightly more complicated tuning mechanism. Both algorithms rely on a novel exploration strategy called implicit exploration, which is shown to be more efficient both computationally and information-theoretically than previously studied exploration strategies for the problem.
Researcher Affiliation Collaboration Tom aˇs Koc ak Gergely Neu Michal Valko R emi Munos Seque L team, INRIA Lille Nord Europe, France {tomas.kocak,gergely.neu,michal.valko,remi.munos}@inria.fr Current affiliation: Google DeepMind
Pseudocode Yes Algorithm 1 EXP3-IX ... Algorithm 2 FPL-IX
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No The paper is theoretical and does not mention using any specific datasets for training or evaluation. It focuses on algorithm design and theoretical regret guarantees.
Dataset Splits No The paper is theoretical and does not discuss empirical experiments with dataset splits. Therefore, no training/validation/test splits are provided.
Hardware Specification No The paper describes theoretical algorithms and their performance guarantees. It does not conduct empirical experiments, and thus no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical, presenting algorithms and their proofs. It does not mention any specific software dependencies or versions required for implementation or reproduction of empirical results.
Experiment Setup No The paper describes theoretical algorithms and their performance. It does not include an experimental setup section with hyperparameters or training settings, as it does not report on empirical experiments.