reproducibilityindex.ai

A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback

Authors: Saeed Masoudian, Julian Zimmert, Yevgeny Seldin

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We present a modiﬁed tuning of the algorithm of Zimmert and Seldin [2020] for adversarial multiarmed bandits with delayed feedback, which in addition to the minimax optimal adversarial regret guarantee shown by Zimmert and Seldin simultaneously achieves a near-optimal regret guarantee in the stochastic setting with ﬁxed delays. Speciﬁcally, the adversarial regret guarantee is O( TK + pd T log K), where T is the time horizon, K is the number of arms, and d is the ﬁxed delay, whereas the stochastic regret guarantee is O i log(T) + d i log K ) + d K1/3 log K , where i are the suboptimality gaps. We also present an extension of the algorithm to the case of arbitrary delays, which is based on an oracle knowledge of the maximal delay dmax and achieves O( TK + p D log K + dmax K1/3 log K) regret in the adversarial regime, where D is the total delay, and O i log(T) + σmax i log K ) + dmax K1/3 log K regret in the stochastic regime, where σmax is the maximal number of outstanding observations. Finally, we present a lower bound that matches the reﬁned adversarial regret upper bound achieved by the skipping technique of Zimmert and Seldin [2020] in the adversarial setting.
Researcher Affiliation	Collaboration	Saeed Masoudian University of Copenhagen saeed.masoudian@di.ku.dk Julian Zimmert Google Research zimmert@google.com Yevgeny Seldin University of Copenhagen seldin@di.ku.dk
Pseudocode	Yes	Algorithm 1: FTRL with advance tuning for delayed bandit
Open Source Code	No	The paper is theoretical and does not mention providing open-source code for the methodology described.
Open Datasets	No	The paper is theoretical and does not involve the use of datasets for training.
Dataset Splits	No	The paper describes theoretical work and does not involve data splits for validation.
Hardware Specification	No	The paper is theoretical and does not describe any experiments that would require hardware specifications.
Software Dependencies	No	The paper is theoretical and does not mention specific software dependencies with version numbers for experimental reproducibility.
Experiment Setup	No	The paper focuses on theoretical analysis and algorithm design, not empirical experiments. Therefore, it does not provide details about an experimental setup, hyperparameters, or training configurations.