reproducibilityindex.ai

Efficient learning by implicit exploration in bandit problems with side observations

Authors: Tomáš Kocák, Gergely Neu, Michal Valko, Remi Munos

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	For this setting, we propose the ﬁrst algorithm that enjoys near-optimal regret guarantees without having to know the observation system before selecting its actions. Along similar lines, we also deﬁne a new partial information setting that models online combinatorial optimization problems where the feedback received by the learner is between semi-bandit and full feedback. As the predictions of our ﬁrst algorithm cannot be always computed efﬁciently in this setting, we propose another algorithm with similar properties and with the beneﬁt of always being computationally efﬁcient, at the price of a slightly more complicated tuning mechanism. Both algorithms rely on a novel exploration strategy called implicit exploration, which is shown to be more efﬁcient both computationally and information-theoretically than previously studied exploration strategies for the problem.
Researcher Affiliation	Collaboration	Tom aˇs Koc ak Gergely Neu Michal Valko R emi Munos Seque L team, INRIA Lille Nord Europe, France {tomas.kocak,gergely.neu,michal.valko,remi.munos}@inria.fr Current afﬁliation: Google DeepMind
Pseudocode	Yes	Algorithm 1 EXP3-IX ... Algorithm 2 FPL-IX
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	No	The paper is theoretical and does not mention using any specific datasets for training or evaluation. It focuses on algorithm design and theoretical regret guarantees.
Dataset Splits	No	The paper is theoretical and does not discuss empirical experiments with dataset splits. Therefore, no training/validation/test splits are provided.
Hardware Specification	No	The paper describes theoretical algorithms and their performance guarantees. It does not conduct empirical experiments, and thus no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical, presenting algorithms and their proofs. It does not mention any specific software dependencies or versions required for implementation or reproduction of empirical results.
Experiment Setup	No	The paper describes theoretical algorithms and their performance. It does not include an experimental setup section with hyperparameters or training settings, as it does not report on empirical experiments.